This component uses Python to encode a set of categorical columns into n dimensions.
A Word2Vec model on multiple concatened columns will be estimated, useful to map categories into a continous space.
This procedure is aimed at avoiding the overfitting caused by training a model on variables with and excessive sparsity.
Furthermore, numerical predictors are preferred when working with models which are optimized using gradients.
If desired, it is possible to reduce the dimensionality of embeddings dimensions by applying a PCA algorithm.
- Type: TableTraining setInput columns of the training set
- Type: TableTest setInput columns of the test set