Efficient JSON Parsing
Challenge details:
For this challenge, time and execution efficiency are key. You are tasked with parsing a JSON file to obtain a data table where each key and its corresponding N-values are parsed into separate columns.
You should go from this JSON structure:
{
"0": [1574, 3773, 3571, ...],
"1": [1193, 376, 73, ...],
...
}
To this parsed data table:
Key | Value 0 | Value 1 | Value 2, ...
0 | 1574 | 3773 | 3571 | ...
1 | 1193 | 376 | 73 | ...
....
Wrap the data parsing and wrangling operations in a configurable component (use Configuration nodes) to output user-defined N-values for each key.
Rules:
Solve task in 15 min max.
No coding allowed (e.g., Python, R, Java, etc.)
Requirements:
Import data using relative paths. In particular, place the dataset in the workflow data area and configure the reader node to import the data "Relative to" ---> "Current workflow data area".
Wrap the data wrangling operations in a configurable component (use Configuration nodes). The component must output those operations to a data table.
Name the component after your team’s name. For example, Team <your_team_name>.
Outcome:
Deliver your solution as a separate workflow and name it: Solution_Finale_<your_team_name>. Place your solution workflow in the same folder of this challenge workflow.
Assessment:
Your team must deliver the correct and most efficient solution:
Correctness: your solution does what we ask you to do in the task. If it does, we check for execution efficiency.
Execution efficiency: your solution must be more efficient (i.e., shorter execution time) than the solution of the other team. We'll run solutions on one of our machines.
Our benchmark is ≈18 sec.
In the event that both teams submit solutions that are perfectly correct and have the exact same execution time, we'll award the team that was faster in submitting the solution (we will verify the time of the latest edits).
In the event that both teams submit solutions that are perfectly correct but largely inefficient (execution time >= 40 sec), we'll award the team that has used the least amount of nodes.
In the event that neither of the teams delivers a fully correct solution, we'll award the team that got closer to the correct solution.
Dataset:
Provided in challenge folder.
Deadline:
April 17, 2024 (submission 15 min later after round starts)**.
** We will verify the date and time of the latest edits.
External resources
Used extensions & nodes
All required extensions are part of the default installation of KNIME Analytics Platform version 5.2.3
Legal
By using or downloading the workflow, you agree to our terms and conditions.