Hub
Pricing About
WorkflowWorkflow

Merge Files

SpreadsheetsExcelMerge
knime profile image
Versionv2.0Latest, created on 
Jan 23, 2026 8:54 AM
Drag & drop
Like
Download workflow
Workflow preview

Merge Files - Unstructured Data

This workflow merges different unstructured data sources to create a new table.

Data often comes from different sources, and datasets usually have different table structures and are not always structured identically.

In this example, we access data from three different Excel files. The goal is to combine the information contained in the three datasets, however, each file has a different, unstructured format that might be an obstacle when analyzing the data:

  • Olympic_Athlete_Bio.xlsx: Contains an American athlete whose nationality is missing.

  • Olympic_Athlete_Event_Results.xlsx: Contains a header on top that needs to be excluded from the table.

  • Olympic_Games.xlsx: Contains empty columns and rows.

💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.

Let's walk through the different nodes involved in this operation:

Excel Reader nodes:

  • Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.

Variable Creator + Cell Updater nodes:

To fix the missing value in the Olympic_Athlete_Bio.xlsx file:

  • Variable Creator node: Create a flow variable with the value to replace the missing value with.

  • Cell Updater node: Specify the cell which should be replaced by the flow variable (Row 10 in column "country_noc").

Table Cropper nodes:

  • To exclude the empty header rows of the Olympic_Athlete_Event_Results.xlsx file; and

  • to exclude the empty columns and rows of the Olympic_Games.xlsx file.

Value Lookup nodes:

  • To add the gender and date of birth from the athletes' bio information sheet to the event results data; and

  • to add the start and end date from the Summer Olympic Games information sheet to the event results data.

Missing Value node:

  • To exclude rows that have missing values in the columns "sex", "born", "start_date", or "end_date".

  • In fact, after a Value Lookup operation it is very likely that some missing rows might have originated when a correspondent key in the lookup table was not found.

Excel Reader node:

  • We create a new file called "Merged_table" and save it in the workflow data area.

  • After executing the node, the file will open automatically.

Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.9.0
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.9.0

    knime profile image
    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.9.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2026 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits