Hub
Pricing About
WorkflowWorkflow

Calculate Document Distance using Word Vectors

DeeplearningMachine learningWord2vecDoc2vecWord vectors
+1
P
Draft Latest edits on 
May 2, 2016 1:30 PM
Drag & drop
Like
Download workflow
Workflow preview
First, we read in a dataset containing sentences and assign each document a unique label. The unique label is used to create a document vector which represents the whole document and not only singe words. Next, we train a Doc2Vec model using the Word Vector Learner node. The Learner Node will output a word vector model containing a vocabulary of all learned words and labels with corresponding word vectors. This can be extracted using a Vocabulary Extractor node witch outputs a column containing the word and a collection column containing the corresponding word vector in the first output port and the same for the labels in the second output port. The length of the vector (layer size) as well as other learning parameters can be adjusted in the Word Vector Learner Node Dialog. In order to visualize the result of the Learner, we select six sentences from the training set containing five sentences which are very similar and one sentence which is dissimilar to the other five sentences. Next, we use a PCA to reduce the dimensionality of our document vectors to two so we can plot them in a scatter plot. In the plot, we can now easily distinguish between the sentences as the dissimilar sentence has a very large distance to all other sentences whereas the similar sentences have a small distance to each other. Workflow Requirements KNIME Analytics Platform 3.4.0 KNIME Deeplearning4J Integration KNIME Deeplearning4J Integration Text Processing Extension

External resources

  • KNIME Deeplearning4J Integration
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.1.0
  • Go to item
    KNIME CoreTrusted extension

    KNIME AG, Zurich, Switzerland

    Versions 3.7.1, 4.1.0

    knime
  • Go to item
    KNIME JavaScript ViewsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    knime
  • Go to item
    KNIME Textprocessing - Deeplearning4J Integration (64bit only)Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits