Hub
Pricing About
WorkflowWorkflow

Merge Files

SpreadsheetsExcelMerge
knime profile image
Versionv1.0Latest, created on 
Feb 14, 2024 2:57 PM
Drag & drop
Like
Download workflow
Workflow preview
This workflow merges different unstructured data sources to come up with a new table. Since data come from different data files, we need to blend together information from three sources. All data have been stored in the workflow data area, which means they are saved within the workflow, the consequence is that the reader nodes don't fail because they are configured to read these files in the data area. These excel files have different unstructured formats that could be an obstacle when analyzing the data. The Olympic_Athlete_Bio has an American athlete whose nationality is missing. The cell updater nodes allow substituting this missing value with the one specified in a flow variable. We need to set the "coordinates" of the cell that must be updated, hence the column name/number and the table row. The Olympic_Athlete_Event_Results has a header on top that needs to be excluded from the table. First, we use the option to specify which row should be used to specify the column names then we use the table cropper to exclude useless columns and the header (that has been placed like the first row by the excel reader node). Olympic_Games file presents missing columns and empty rows when read. The table cropper node allows reshaping this table and to get rid of the. The user must specify the interval of columns and rows that he wants to keep. 1) First information to retrieve is the date of birth of the athlete. We used a Value Lookup node to merge this information to the first data source. The key used to merge these files is the "athlete_id". 2)The second Value Lookup adds the starting date of the Olympics edition, which is required to compute the athlete's age. The key used to merge these files is the "edition_id". 3) The missing value node excludes the unmatched rows, in fact, after a Value Lookup operation it is very likely that some missing rows might be originated when a correspondent key in the look-up table is not found. 4)Finally, the table is saved in the workflow data area in the excel format.
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.0.0
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.0.0

    knime profile image
    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.0.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits