This workflow shows how to compute word embedding on a set of categorical variables with the granularity which allows them to be used as input of predictive models. In the second part of the workflow a principal component analysis is applied to the embeddings dimensions. I kept only a portion of them which caputures most of the variation, by doing so you can monitor model complexity. Following py package required: pandas gensim numpy nltk
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.2 Note: Not all extensions may be displayed.
By using or downloading the workflow, you agree to our terms and conditions.
Discussions are currently not available, please try again later.