use R to list excel sheet names, extract the data and keep only columns that are present in all sheets
use R package readxl to list all sheets of excel files from a folder, determine their sheets and columns and guess the type. In the end keep only those columns and data that are present in all files
I built a solution but you may want to check it out if it works for you. With R I check all the sheets in the excel files from a folder. The sheets get imported and read back into KNIME the type is determined by a guess from the first 50k lines.
Then I try to find out which combination of type and column name is there the most (all of the time - you might adapt that) and then only those are kept. But initially, all the data is loaded into KNIME so you might use it later. Filename and sheet-name are stored for later use.
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.0.2
Note: Not all extensions may be displayed.
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Loading deployments
Loading ad hoc jobs
Legal
By using or downloading the workflow, you agree to our terms and conditions.