A Dashboard to Predict Data Science Salaries
Challenge description:
You work as a data scientist for an online job listing portal. Job-seekers mostly use the portal to obtain salary predictions based on a few key information they provide (e.g., experience level, employment type, etc.). You have already trained a Simple Regression Tree that predicts the salary of a job according to different features (e.g., title, experience, location, company size, etc.). You have also exported the trained ds_salaries_predictor.model and a table called feature_values.table that contains the features (an all the possible feature values) seen by the model during training.
Your task is to build an interactive and responsive data app that collects user input information and outputs the predicted salary. After providing a salary prediction, make sure to collect also a few sensitive user data (e.g., name, surname, email, etc.), which could be used, for example, to send personalized newsletters with job offers. Lastly, anonymize sensitive user data and store it together with the user input information used to predict salaries in an SQLite database.
Key requirement: you must use the nodes of the Redfield Privacy Nodes extension (https://hub.knime.com/redfield/extensions/se.redfield.arx.feature/latest/) to anonymize the sensitive user data you decided to collect.
Outcome:
An interactive dashboard that predicts salaries according to user input information and, in the back-end, stores it in an SQLite database.
Deliver your solution as a separate workflow and name it: Solution_Round_16_<your_team_name>. Place your solution workflow in the same folder of this challenge workflow.
Teams are strongly encouraged to submit high-quality work in order to improve their chances of getting maximum points. Don't be afraid to go the extra mile! :)
Dataset:
The files ds_salaries_predictor.model and feature_values.table are provided in the challenge folder.
Training dataset: Data Science Salaries 2023 dataset from Kaggle: https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023. Note that not all features have been used for training.
Deadline:
March 10, 2024 (submission by 11:59 PM CET) **. Check the calendar of the tournament: https://info.knime.com/game-of-nodes
** We will verify the date and time of the latest edits.
KNIME Game of Nodes:
Rules, Assessment Criteria & FAQs: https://info.knime.com/game-of-nodes
External resources
Used extensions & nodes
All required extensions are part of the default installation of KNIME Analytics Platform version 5.2.1
No known nodes available
Legal
By using or downloading the workflow, you agree to our terms and conditions.