Simple parallel threaded implementation of LDA, following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models JMLR (2009), with SparseLDA sampling scheme and data structure from Yao, Mimno and McCallum, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD (2009).
The node uses the "MALLET: A Machine Learning for Language Toolkit." topic modeling library. Note: The current version of MALLET contains a known multi-threading bug that can cause the node to fail with an ArrayIndexOutOfBoundsException. Should you encounter this issue, setting the number of threads to one should solve the problem.