Read Data from different Sources
This workflow shows how to read, merge, and manipulate data from different sources (Google Sheets & Excel file) using KNIME Analytics Platform.
For this example, we have prepared a public Google Sheets spreadsheet called "Olympic_medal_by_country". You can read and overwrite this spreadsheet by authenticating it through a Gmail account. However, the spreadsheet only contains the medal counts of the Summer Olympic Games 1896-2016; the medal counts of the 2020 Summer Olympic Games are saved in a separate Excel file.
The aim of this workflow is to access both datasets with KNIME, merge all data into one data table, perform some data transformation and manipulation tasks, and finally update the public spreadsheet "Olympic_medal_by_country".
💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.
Let's walk through the different nodes involved in this operation:
Google Authenticator node:
In the configuration node, we select "Interactive" as Authentication type.
Under "Scopes of access" we select: Scope type: Standard; Scope/permission: Google Sheets as we need to read and write to Google Sheets.
To authenticate, we click the "Login" button in the configuration window and follow the steps in the pop-up window. While in interactive mode, credentials are kept in memory and discarded upon exiting KNIME.
Google Sheets Connector + Google Sheets Reader nodes:
Google Sheets Connector node: This node has no configuration dialog. Simply execute the node to create the connection to Google Sheets.
Google Sheets Reader node: Here, we select the spreadsheet we want to access ("Olympic_medal_by_country"). Make sure, "Select First Sheet" (Sheet 1) and "Has Column Header" are selected. If you want to access your own/a different Google Sheets spreadsheet, click "Select" and choose it.
After executing these two nodes, we have the Olympic medal data available as KNIME data table.
Excel Reader node:
Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.
In the "Data Area" tab, we select to read the "Whole sheet". This configuration allows us to have the sheet as it is. The intent is to respect its original structure.
String to Number node:
Some of the columns of the spreadsheet were parsed as strings when accessing them in KNIME. We convert them back into numbers (Type: integer) to match the table specifications of the Excel file.
Concatenate node:
Now that both tables have the same structure, we concatenate the Google Sheets spreadsheet and the Excel file.
To account for duplicate RowIDs, we select to "Append suffix". As suffix we set "_2020" which is appended to the RowIDs of the second input port.
Row Aggregator node:
For each country, we calculate the total sum of medals won over all years. We define the following settings:
Category column: country
Aggregation method: Sum
Columns to include in the aggregation: all integer columns
Weight column: none
Make sure to select "Additional "grand totals" at second output port" as we want to append them later to the spreadsheet
Sorter node:
We sort the aggregated country data (top output port of Row Aggregator node) by the total medal count ("total") in descending order.
Column Renamer node:
We rename the columns of the grand totals table (bottom output port of Row Aggregator node) as we want to append the grand totals to the aggregated country data and want to avoid duplicate column names.
Column Appender node:
We append the grand totals to the country data. We select "Generate new RowIDs".
Google Sheets Updater node:
We append the resulting dataset to a new sheet called "medals_updated_by_country" in the existing "Olympic_medal_by_country" spreadsheet. Note: Make sure not to overwrite the original data in "Sheet 1".
After executing the node, the file will open automatically.