Handling sparse categorial variables with Word2Vec

Workflow

Handling sparse categorial variables with Word2Vec

Draft Latest edits on

This workflow shows how to compute word embedding on a set of categorical variables with the granularity which allows them to be used as input of predictive models. In the second part of the workflow a principal component analysis is applied to the embeddings dimensions. I kept only a portion of them which caputures most of the variation, by doing so you can monitor model complexity. Following py package required: pandas gensim numpy nltk

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.