Cluster Abilities, Powers, and Strengths of Superheroes
Challenge description:
You work as a data scientist for a recruiting agency specialized in casting actors for movie production companies that require the services of the recruiting agency. Your employer, though, is not like any other recruiting agency. The candidates you work with are superheroes with diverse skills, powers and strengths. A client of the agency, Fantastic Movies Inc., is starting the production of a new superhero movie and needs to cast actors. However, they are still unsure about what kind of superhero profile they need.
As the data scientist at the recruiting agency, your manager asks you to build a clustering analysis pipeline to group all available superheroes into different clusters based on their powers. Take all the preprocessing steps that you deem necessary and pick a clustering algorithm of your choice. Clearly, your manager is interested in obtaining clusters that are as distinctly separated as possible, so the way they are formed must be optimized.
Additionally, you’re required to present the results of your clustering analysis visually, both in an interactive dashboard and in a static PDF report. Consider including additional information, visualizations and metrics that help Fantastic Movies Inc. make their choice (e.g, displaying the top five strengths available across all superheroes, statistics on biometric characteristics, etc.).
Key requirement: your clustering pipeline must include a clustering optimization technique, and the dashboard/PDF must contain at least five different visual insights.
Outcome:
A clustering analysis pipeline to group superheroes by their superpowers and a report (via an interactive dashboard and a static PDF) to display identified clusters.
Deliver your solution as a separate workflow and name it: Solution_Round_8_<your_team_name>. Place your solution workflow in the same folder of this challenge workflow.
Teams are strongly encouraged to submit high-quality work in order to improve their chances of getting maximum points. Don't be afraid to go the extra mile! :)
Dataset:
Marvel Superheroes dataset from Kaggle: https://www.kaggle.com/datasets/dannielr/marvel-superheroes?select=superheroes_power_matrix.csv (in the Kaggle space other datasets on superheroes' characteristics are available)
Deadline:
March 24, 2024 (submission by 11:59 PM CET) **. Check the calendar of the tournament: https://info.knime.com/game-of-nodes
** We will verify the date and time of the latest edits.
KNIME Game of Nodes:
Rules, Assessment Criteria & FAQs: https://info.knime.com/game-of-nodes
External resources
Used extensions & nodes
All required extensions are part of the default installation of KNIME Analytics Platform version 5.2.2
Legal
By using or downloading the workflow, you agree to our terms and conditions.