Hub
Pricing About
WorkflowWorkflow

Non-standard format Spreadsheets

ExcelSpreadsheet
knime profile image
Versionv2.0Latest, created on 
Jan 23, 2026 8:55 AM
Drag & drop
Like
Download workflow
Workflow preview

Non-standard Spreadsheet Formats

This workflow shows how to work with spreadsheets that come in a non-standard format using KNIME Analytics Platform.

The aim is to clean up and re-arrange a spreadsheet in a non-standard format and then apply a row aggregation. In this example, we access a dataset containing personal information about Olympia athletes and we want to find the total number of male and female athletes born in January 2000.

💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.

Let's walk through the different nodes involved in this operation:

Excel Reader node:

  • Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.

  • In the "Data Area" tab, we select to read the "Whole sheet" and unflag to skip "empty rows". This configuration allows us to read the sheet as it is. The intent is to respect its original structure.

Row to Column Header node:

The column headers of the input table are not placed in the second row.

  • We select the second row of the input table as our new column header.

  • We also remove every row before the new column header by setting the appropriate checkbox "Discard rows before header row".

Table Splitter node:

The input table has a separator row where all column values are equal to "-" (RowID = Row413). We can't use that row in the aggregation later, which is why we need to remove that row.

  • We split the table at this column. We select an arbitrary column as the lookup column and choose the matching criteria as "Equals" and setting the search pattern to "-".

  • We also choose to not include the matching row in the top or the bottom output table. That way, we filter out the separator row.

Cell Extractor + Cell Updater nodes:

The table contains a misplaced date value which is placed in Row5 instead of Row4.

  • Cell Extractor node: We extract the date value from this cell as a flow variable ("extracted_cell").

  • Cell Updater node: We use the previously created flow variable to overwrite the missing date value in row 2 (RowID = Row 4).

Row Filter node:

The table contains two empty rows which we want to remove (i.e., rows with all values missing): Row3 and Row5.

  • We define two filter criteria. As column to filter we set "RowID", as operator we choose "Equals", and as value we set "Row3" and "Row5" respectively.

  • Note: It's important to choose "Match row if matched by any criterion".

  • We select to "output non-matching rows".

Concatenate node:

  • After cleaning the upper table, we concatenate the two tables back together.

Column Filter node:

The table contains an empty column (i.e., a column with all values missing) called "Column3".

  • In the Include/Exclude panel we move "Column3" to the "Exclude" side to remove it from the table.

Table Cropper node:

We need to crop the table so that only athletes born in January 2000 remain, so that we can aggregate the data accordingly.

  • Select the column range ("athlete_id" to "country_noc") and row range (row nr. 1 to row nr. 69).

Row Aggregator + Column Renamer nodes:

We cleaned and re-arranged the spreadsheet. Now we can aggregate the data.

  • Row Aggregator node: We choose "Sex" as the category column and "Occurrence count" as the aggregation method.

  • Column Renamer node: We rename the columns to give them more distinctive names.

Column Appender node:

  • We append the gender occurrence count table (output of the Row Aggregator + Column Renamer node) to the modified athletes' bio information sheet.

  • Since the two tables have different table specs, it's important to select "Generate new RowIDs".

Excel Writer node:

  • We append the dataset to a new sheet called "Sheet_1_modified" in the existing Excel file located in the workflow data area.

  • After executing the node, the file will open automatically.

As you can see from the output, we now have the modified and cleaned data table together with the gender occurrence count table next to it.

Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.9.0
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.9.0

    knime profile image
    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.9.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2026 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits