This workflow shows how to compute word embedding on a set of categorical variables with the granularity which allows them to be used as input of predictive models.
In the second part of the workflow a principal component analysis is applied to the embeddings dimensions. I kept only a portion of them which caputures most of the variation, by doing so you can monitor model complexity.
Following py package required:
pandas
gensim
numpy
nltk
Workflow
Handling sparse categorial variables with Word2Vec
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.2 Note: Not all extensions may be displayed.
Legal
By using or downloading the workflow, you agree to our terms and conditions.