The Topic Model Goodness of Fit (χ2) component calculates the χ2 statistic to assess the goodness of fit of a topic model. It operates on the output of an LDA or any other similar model that outputs the Topic Word Weight (TWW) and Document Topic Weight (DTW) matrices.
It exclusively utilizes KNIME Nodes, eliminating the necessity for installing or configuring external scripting languages
This component follows the methodology implemented in the R library "OpTop" (Optimal Topic Modeling), available at GitHub - Optop (https://github.com/contefranz/OpTop).
However, it's important to note that due to potential technical differences in implementation, results may vary slightly.
The theoretical framework for this component is based on the statistical test described in the paper:
Lewis, C. M., & Grossetti, F. "A Statistical Approach for Optimal Topic Model Identification" (Journal of Machine Learning, 23(58), 1−20, 2022).
This paper outlines the first statistical test used for optimal topic model identification, providing the theoretical foundation for the χ2 statistic computation.
- Type: TableDTWDocument Topic Weight
- Type: TableTTWTopic Term Weight