Level: Medium
Description:
As a data-driven DJ, you’re tasked with curating the perfect playlist to keep the crowd dancing nonstop during a two-hour event focused on Indian music. The dataset you’re using comprises songs from 15 Indian languages, giving you a diverse range of tracks to work with. Your goal is to select songs with the highest danceability scores, ensuring that each track contributes to an energetic atmosphere throughout the event. You’ll use the dataset from Kaggle to choose the best tracks, sort them by danceability, calculate cumulative durations, and filter the playlist to stay within the two-hour limit.
Beginner-Friendly Objective(s):
Load and preprocess the data (If you struggle to combine the CSV files, you can find a pre-joined dataset in the current workflow's datasera folder).
Sort the songs based on their danceability scores, focusing on the highest scores first.
Intermediate-Friendly Objective(s):
Import multiple CSV files using a loop structure.
Sort the songs based on their danceability scores, focusing on the highest scores first.
Convert song durations from "HH:MM" format to total seconds.
Calculate cumulative durations, starting from the top songs.
Filter the songs to ensure the total playlist duration does not exceed two hours.
Dataset: Spotify Indian Languages Dataset
Solution Summary: The solution involves a series of data processing steps to create a playlist of songs that fits within a two-hour duration. The workflow begins by listing files in a specified directory and reading song data from a CSV file. It then sorts the songs by danceability and converts their durations into seconds. The "Moving Aggregator" node is is used to calculate cumulative durations, and a filter is applied to ensure the total duration does not exceed two hours. The final output is a table view displaying the list of selected songs.
Solution Details: The workflow starts with the "List Files/Folders" node, which lists all files in the "songs" directory, excluding subfolders and hidden files. The "CSV Reader" node then reads the song data for each of the 15 CSV files. The "Sorter" node sorts the songs by the "danceability" column in descending order. An "Expression" node converts the "duration" column from "HH:MM" format to total seconds, appending the result as a new column. The "Moving Aggregator" node is not to calculate the cumulative duration in seconds starting from the top row. The "Row Filter" node ensures the total duration does not exceed 7200 seconds (two hours). Finally, the "Column Filter" node selects relevant columns, and the "Table View" node displays the final playlist.