Spreadsheets and Loops
This workflow shows how to append additional sheets to an existing spreadsheet by implementing a loop using KNIME Analytics Platform.
The aim is to create new tables from the input spreadsheet and append them as new sheets, where each sheet is supposed to have a custom name provided by a flow variable. We use the athlete event results spreadsheet that contains data from the 1896 Summer Olympic Games.
💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.
Let's walk through the different nodes involved in this operation:
Excel Reader nodes:
Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.
In the "Data Area" tab, we select to read the "Whole sheet". This configuration allows us to have the sheet as it is. The intent is to respect its original structure.
Group Loop Start node:
The Group Loop Start node groups the table given one or multiple "group" column(s) and processes the data group by group. In this case, we group the athlete event results by column "sport" (the only column selected in the Include panel).
The node also contains iteration information in form of flow variables. In the "Flow Variables" tab in the node monitor, you can see the variables "groupIdentifier", "sport", and "currentIteration". We are interested in the "sport" variable because it represents the group value (i.e., sport) of the current iteration, which we will use for the sheet names.
Variable Expression node:
To create a custom name for each new sheet, we built an expression that combines the current value of the "sport" flow variable with the year of the Olympics ("1896").
We select to "Create" a new flow variable called "sheet-name".
Expression + Row Aggregator + Sorter nodes:
Expression node: We implement a logic that assigns the value 1 if the athlete won a medal (any type), or 0 if no medal was won.
Row Aggregator node: We aggregate the data to calculate the total number of medals won for each athlete:
Category column: athlete
Aggregation method: Sum
Column to include: medal
Weight column: none
Sorter node: We sort the aggregated data by total medal count in descending order.
Excel Writer node:
We append the resulting dataset to a new sheet in the existing dataset. We select to "append" the existing file and use the flow variable "sheet-name" to control the name of the sheet. It changes in each iteration.
To set the sheet name to the flow variable, we go to the "Flow Variables" tab in the configuration window and select the variable "sheet-name" in the "sheet_names" setting.
Variable Loop End node:
To close the loop, we need to add a Loop End node. Since the Excel Writer node does not provide a table output port, we use the Variable Loop End node to close the loop (by creating a flow variable connection).
You can observe the resulting data table by opening the Excel file located in your workflow's data area: /workflow_group/my_workflow/data/. The data folder is stored inside the workflow (stored on your local machine).
As you can see from the output, we appended a sheet for each sport with the defined custom name and the sorted list of athletes' medals count.