Topic Model Goodness of Fit (χ²)

The Topic Model Goodness of Fit (χ2) component calculates the χ2 statistic to assess the goodness of fit of a topic model. It operates on the output of an LDA or any other similar model that outputs the Topic Word Weight (TWW) and Document Topic Weight (DTW) matrices. It exclusively utilizes KNIME Nodes, eliminating the necessity for installing or configuring external scripting languages This component follows the methodology implemented in the R library "OpTop" (Optimal Topic Modeling), available at GitHub - Optop (https://github.com/contefranz/OpTop). However, it's important to note that due to potential technical differences in implementation, results may vary slightly. The theoretical framework for this component is based on the statistical test described in the paper: Lewis, C. M., & Grossetti, F. "A Statistical Approach for Optimal Topic Model Identification" (Journal of Machine Learning, 23(58), 1−20, 2022). This paper outlines the first statistical test used for optimal topic model identification, providing the theoretical foundation for the χ2 statistic computation.

Component details

Ports Options Views

Input ports

Type: Table
DTW
Document Topic Weight
Type: Table
TTW
Topic Term Weight

Output ports

Type: Table
Model fit
χ2 for the entire LDA model
Type: Table
Document fit
χ2 for each document

External resources

Legal

By using or downloading the component, you agree to our terms and conditions.