Hub
Pricing About
WorkflowWorkflow

Join and Clean Spreadsheets

SpreadsheetJoinExcel
knime profile image
Versionv2.0Latest, created on 
Jan 23, 2026 8:54 AM
Drag & drop
Like
Download workflow
Workflow preview

Join and Clean Spreadsheets

In this workflow, we see how to join two datasets and how to clean up a table using KNIME Analytics Platform.

Sometimes, more than a VLOOKUP is needed. You may need to compare two datasets by joining them on a shared column key and see which rows they have in common and what is missing in one or the other table.

In this example, we have access to two datasets:

  • The athlete event results of the Summer Olympic Games (1896-2020)

  • The bio information of the athletes who participated in any of the Summer or Winter Olympics

We want to remove all athletes from the athletes' bio information sheet who have never participated in any Summer Olympic Games.

💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.

Let's walk through the different nodes involved in this operation:

Excel Reader nodes:

  • Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.

  • In the "Data Area" tab, we select to read the "Whole sheet". This configuration allows us to have the sheet as it is. The intent is to respect its original structure.

Joiner node:

  • We use the Joiner node to combine both data tables on the "athlete_id" column. The athlete event results data (top input port) is the left table; the athletes' bio information sheet (bottom input port) is the right table. Keep in mind that one athlete could have participated in more than one event or Olympic Games edition.

  • In the "Include in Output" section, we want to include everything (matching rows, left unmatched rows, and right unmatched rows).

  • In the "Output Columns" section, we select which columns to include in the joined output table.

  • Lastly, we select to "split join result into multiple tables". By enabling this option, all three output ports of the Joiner nodes will be active:

    • Top port: The join result (athletes' bio information combined with their event results).

    • Middle port: Left unmatched rows. Information from the athlete event results data of which no bio information is available.

    • Bottom port: Right unmatched rows. Information from the athletes' bio information sheet of which no event data is available (i.e., those are the athletes that have never participated in any Summer Olympics Games).

Column Filter node:

We continue working with the right unmatched rows, i.e., the athletes that have never participated in any Summer Olympics Games.

  • We filter out the columns to keep only the "athlete_id" column. We will use it in the next step as a reference column in the Reference Row Filter node.

Reference Row Filter node:

  • The Reference Row Filter node has two input ports:

    • Top port: The data to be filtered (here: the athletes' bio information sheet).

    • Bottom port: The reference data (here: the list of athletes that have never participated in any Summer Olympics Games).

  • We use "athlete_id" as data and reference column and we exclude the matches. That way, we exclude all athletes from the bio information sheet that have never participated in any Summer Olympics Games.

Excel Writer node:

  • We create a new file called "Olympic_Athlete_Bio_cleaned" and save it in the workflow data area.

  • After executing the node, the file will open automatically.

Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.9.0
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.9.0

    knime profile image
    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.9.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2026 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits