This workflow is Team AST’s solution for the Round of 16 challenge in the KNIME Game of Nodes 2024. Some modifications have been made to the originally submitted workflow.
------------------------------------------------------------------------------------------------------------------
This KNIME workflow automates the creation of models predicting data science job salaries. It interprets models using feature importance and LIME, an XAI technology. The workflow results and model interpretations can be compiled into a PDF report. With minor modifications, this workflow can be adapted for other data sets.
------------------------------------------------------------------------------------------------------------------
Train, Score and Explain a Machine Learning Model
Challenge description:
You work as a data scientist for a recruiting agency specialized in matching job-seekers in the AI, Data Science and IT space with vacancies in companies that require the services of the recruiting agency. Unfortunately, companies are often reluctant to disclose salaries in job offers. Therefore, in order to attract the best candidates, your boss has tasked you with building a machine learning pipeline to predict data science salaries.
Use the provided dataset on data science jobs to train and score a machine learning model of your choice that predicts data science salaries. Perform the pre-processing operations you deem necessary and select meaningful features to train the model.
Clearly, your boss would like to obtain predictions that are as accurate as possible. Additionally, she expects you to be able to explain the model's decision-making process.
Key requirement: you must use an explainable AI (XAI) technique of your choice to explain the model's predictions and provide a short written description (max. 100 words) in an annotation. For example, you could use one of KNIME Verified Components on Model Interpretability: https://hub.knime.com/knime/spaces/Examples/00_Components/Model%20Interpretability~WMtQn1U91a-xzZY3/.
Outcome:
A machine learning pipeline for data pre-processing, model training, scoring, and explanation via explainable AI (XAI) techniques.
Deliver your solution as a separate workflow and name it: Solution_Round_16_. Place your solution workflow in the same folder of this challenge workflow.
Teams are strongly encouraged to submit high-quality work in order to improve their chances of getting maximum points. Don't be afraid to go the extra mile! :)
Dataset:
Data Science Salaries 2023 dataset from Kaggle: https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023
Deadline:
March 10, 2024 (submission by 11:59 PM CET) **. Check the calendar of the tournament: https://info.knime.com/game-of-nodes
** We will verify the date and time of the latest edits.
KNIME Game of Nodes:
Rules, Assessment Criteria & FAQs: https://info.knime.com/game-of-nodes