This workflow shows how to compute word embedding on a set of categorical variables with the granularity which allows them to be used as input of predictive models.
A PCA Analysis is directly applied inside the component. I kept only a portion of embedding dimensions which caputures most of the variation, by doing so you can monitor model complexity.
py package required:
pandas
gensim
numpy
nltk
Workflow
Handling sparse categorial variables with Word2Vec
Used extensions & nodes
Created with KNIME Analytics Platform version 4.5.1 Note: Not all extensions may be displayed.
Legal
By using or downloading the workflow, you agree to our terms and conditions.