Movie Recommendation System
Author(s)
Tylah Jenkins (14248037)
Dayhe Kwon (25531308)
Aabhii Taneja (13568821)
Harshit Setia (25206804)
Date: 01/08/2025
Overview & Purpose
This workflow builds a simple movie recommendation system using the MovieLens 100k dataset to predict user preferences and movie ratings.
Data Used
Uses MovieLens 100k dataset files: u.data (for the final working model), uX.base/test files if running the hyperparameter tuning, and u.user/u.item if running the hybrid model.
Required Location: On the user's Desktop is the most convenient location.
Methodology
Workflow steps include:
Data loading (ingestion) and initial inspection
Cleaning & preprocessing (column renaming, cleaning was conducted and trusted from source)
Data splitting (Partitioned - 80/20 train/test split) / Test validation set checks
Spark environment initialisation and required transformations
Model training/prediction (Collaborative Filtering)
Output generation (predicted vs actual)
Evaluation & Analysis
How to run the Workflow
Place data files in the specified location.
Open workflow in KNIME.
Drag mouse over all nodes in the 'Final Working Model' section and select execute all.
Outputs
Actual vs. Predicted ratings.
Table of Top 10 Recommendations table for each user (unrated movies).
Evaluation metrics table: RMSE and Recall.
Evaluation Metrics
System evaluated using RMSD and Recall metrics on the u.data set (training/test split).
Assumptions
Data files are complete, unaltered, and tab-separated.
Input data includes required pre-cleaning (users >= 20 ratings, with complete demographics).
Rating scale is a 1-5.
Data provided is reliable and credible.