Here we transform the collection of documents into numerical vectors. The dataset used in this example is the KNIME Forum Dataset. After the pre-processing phase, the relative term frequency is computed for each term inside the Transformation component. The input data set is partitioned into training set and test set. The term frequencies from the training set are used to build a vector representation of the distinct terms identified by the BoW with a Document Vector node.The same Document Vector transformation is then applied to the Documents in the test set.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.5.0
Loading ad hoc executions
By using or downloading the workflow, you agree to our terms and conditions.
Discussions are currently not available, please try again later.