VLOOKUP Function - Different Files
This workflow shows how to perform a VLOOKUP function in KNIME Analytics Platform. In this example we perform the VLOOKUP on one table, using a second table from a different file as the dictionary.
The goal is to access a dataset that contains information about athletes who participated in the Olympics Games. However, the dataset contains only the athletes' ID number, date of birth, and three-letter country code. We want to replace the athletes' ID number by their full name and the country code by the full country name via a VLOOKUP operation. For this purpose, we access a second dataset that contains the athlete-country dictionary, containing all athletes' ID numbers and their full names as well as all country codes and the full country names.
💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.
Let's walk through the different nodes involved in this operation:
Excel Reader nodes:
Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.
In the "Data Area" tab, we select to read the "Whole sheet" and unflag to skip "empty rows". This configuration allows us to read the sheet as it is. The intent is to respect its original structure.
Number to String node:
We convert the "athlete_id" column to string. This is required to properly perform the VLOOKUP later because a Value Lookup node requires the two columns that are compared to be of identical column types.
Value Lookup nodes:
With the first Value Lookup node, we add the athletes' names to the input data. As lookup column in the input data table, we define column "A"; as key column in the dictionary table, we use column "athlete_id". In the include/exclude panel, we only include the column "athlete" as this is the column that contains the athletes' full name.
With the second Value Lookup node, we add the full country names to the input data. As lookup column in the input data table, we define column "E"; as key column in the dictionary table, we use column "coutry_noc". In the include/exclude panel, we only include the column "country" as this is the column that contains the full country names.
Variable Creator + Cell Updater nodes:
You might have noticed that the input data table does not contain proper column names. The column header is placed in Row 3. The two columns "athlete" and "country" which were added via the Value Lookup nodes are missing a header value.
Variable Creator node: We create two flow variables: "athlete-header" with the value "athlete" and "country-header" with the value "country". We will use these flow variables to replace the missing header values in the data table.
Cell Updater nodes: The first Cell Updater node replaces the missing value in column "athlete", Row 3. The second Cell Updater node replaces the missing value in column "country", Row 3.
Table Updater node:
At the top port, we pass our original input data (without the lookup values) with the column "A" renamed into "athlete" and "E" to "country" to match the column names of the dictionary table. At the bottom port, we pass the sheet with the appended athlete names and country names in the columns "athlete" and "country".
The Table Updater node compares inputs and updates the content of matching cell in the top input table. A cell is matching if they share the same column name and RowID.
We have now updated our input data table so that the "athlete_id" column was replaced by the athletes' full names and the "country_noc" column was replaced by the full country name.
Excel Writer node:
We append the dataset to a new sheet called "Sheet_1_modified" in the existing Excel file located in the workflow data area.
We unflag the "Write column headers" checkbox to maintain the original table structure.
After executing the node, the file will open automatically.
As you can see from the output, we have the same sheet structure but now with country names and athlete names instead of country codes and athlete IDs.