Hub
Pricing About
NodeNode / Learner

Topic Extractor (Parallel LDA)

Other Data TypesText ProcessingMining
Drag & drop
Like

Simple parallel threaded implementation of LDA , following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models JMLR (2009), with SparseLDA sampling scheme and data structure from Yao, Mimno and McCallum, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD (2009).

The node uses the "MALLET: A Machine Learning for Language Toolkit." topic modeling library. Note: The current version of MALLET contains a known multi-threading bug that can cause the node to fail with an ArrayIndexOutOfBoundsException. Should you encounter this issue, setting the number of threads to one should solve the problem.

Node details

Input ports
  1. Type: Table
    Document table
    Data table with the document collection to analyze. Each row contains one document.
Output ports
  1. Type: Table
    Document table with topics
    The document collection with topic assignments and the probability for each document to belong to a certain topic
  2. Type: Table
    Topic terms
    The topic models with the terms and their weight per topic
  3. Type: Table
    Iteration statistics
    Table with statistics for each iteration

Extension

The Topic Extractor (Parallel LDA) node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits