Topic Extractor (Parallel LDA)

Node / Learner

Topic Extractor (Parallel LDA)

Simple parallel threaded implementation of LDA , following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models JMLR (2009), with SparseLDA sampling scheme and data structure from Yao, Mimno and McCallum, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD (2009).

The node uses the "MALLET: A Machine Learning for Language Toolkit." topic modeling library. Note: The current version of MALLET contains a known multi-threading bug that can cause the node to fail with an ArrayIndexOutOfBoundsException. Should you encounter this issue, setting the number of threads to one should solve the problem.

Node details

Ports Options Views

Input ports

Type: Table
Document table
Data table with the document collection to analyze. Each row contains one document.

Output ports

Type: Table
Document table with topics
The document collection with topic assignments and the probability for each document to belong to a certain topic
Type: Table
Topic terms
The topic models with the terms and their weight per topic
Type: Table
Iteration statistics
Table with statistics for each iteration

Extension

The Topic Extractor (Parallel LDA) node is part of this extension:

Go to item

Topic Extractor (Parallel LDA)

Node details

Input ports

Output ports

Extension

Related workflows & nodes