This workflow shows how to compute word embedding on a set of categorical variables with the granularity which allows them to be used as input of predictive models.
In the second part of the workflow a principal component analysis is applied to the embeddings dimensions. I kept only a portion of them which caputures most of the variation, by doing so you can monitor model complexity.
Following py package required:
pandas
gensim
numpy
nltk
Workflow
Handling sparse categorial variables with Word2Vec
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.2
Note: Not all extensions may be displayed.
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Loading deployments
Loading ad hoc executions
Legal
By using or downloading the workflow, you agree to our terms and conditions.
Discussion
Discussions are currently not available, please try again later.