A Comprehensive Analysis and Prediction of Zoo Animals
Challenge description:
You work as a biologist at the local zoo and have been observing various animals with the aim of compiling a dataset with different animal attributes. For example, for each animal, the dataset indicates the presence of hair, feathers, whether the animal lays eggs, produces milk, is airborne, aquatic, etc. The director of the zoo has demanded that you conduct a comprehensive analysis of the animals and their class (e.g., mammals, birds, reptiles, fish, etc.) in the zoo, requiring insights to be strongly data-driven. You are free to enrich the dataset with additional data that you deem relevant.
Your task is to create an interactive dashboard where you:
Explore and visualize the data using at least 3 different visualizations, each providing different insights. Prefer visualizations that are less common (e.g., Radar chart, Density plot, etc.)
Perform a correlation analysis to identify relationships between attributes. Use a visualization technique to display the correlation matrix and comment on the relationships.
Conduct a hypothesis test of your choice to explore the relationship between two attributes (e.g., Chi-Square test of independence). Clearly formulate the H₀ (null hypothesis) and H₁ (alternative hypothesis), and explain the meaning of your test results.
Additionally, train and optimize a ML model of your choice to predict the class of each animal based on its attributes. Evaluate the model's performance and assess the uncertainty of predictions using conformal prediction theory. Display and comment the results in the dashboard.
Lastly, generate a 1-page, static PDF report with the key insights of your analysis for easy distribution across the zoo personnel.
Key requirement: You must rely on the Redfield Conformal Prediction Nodes and on the KNIME Reporting extension.
Outcome:
An interactive dashboard that includes visualizations, correlation analysis, hypothesis testing results, and evaluation of a predictive model. A 1-page static PDF report with key insights.
Deliver your solution as a separate workflow and name it: Solution_Round_4_<your_team_name>. Place your solution workflow in the same folder of this challenge workflow.
Teams are strongly encouraged to submit high-quality work in order to improve their chances of getting maximum points. Don't be afraid to go the extra mile! :)
Dataset:
Zoo Animal Classification dataset sourced from Kaggle (download the zoo.csv file): https://www.kaggle.com/datasets/uciml/zoo-animal-classification/data
Deadline:
February 16, 2025 (submission by 11:59 PM CET) **. Check the calendar of the tournament: https://info.knime.com/game-of-nodes
** We will verify the date and time of the latest edits.
KNIME Game of Nodes:
Rules, Assessment Criteria & FAQs: https://info.knime.com/game-of-nodes