This workflow illustrates the Multivalue OneHot Coding's implementation process. This OneHot coding is applied to the column named "genres" from a table created in KNIME (The column doesn't have missing values, but this condition is considered in the process, getting as result a string and a vector with only zero values without additional columns).
The defined prefix and suffix to name the coding's columns may be set at the Variable Creator's node. For example, a registered genre in the dataset is Adventure, therefore, its correspondent coding's column name is "hasActionAsGenre" in the sample.
The movies and genres lists were extracted from the IMDB 5000 Movie Dataset, the dataset is available in Kaggle at https://www.kaggle.com/datasets/carolzhangdc/imdb-5000-movie-dataset.
Workflow
OneHot Coding for Multivalue Variable Sample
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.7.1
Legal
By using or downloading the workflow, you agree to our terms and conditions.