This workflow demonstrates how to compute the Chi-Square Statistics to address the goodness of fit in topic modeling. It tests the multinomial assumptions behind the LDA model and examines whether the observed and estimated word vectors are statistically indistinguishable.
In this workflow, the objective function we aim to optimize is the number of topics to extract.
Lewis, C. M., & Grossetti, F. "A Statistical Approach for Optimal Topic Model Identification" (Journal of Machine Learning, 23(58), 1−20, 2022).
Workflow
Topic Modeling - Goodness of Fit
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 5.2.3
Legal
By using or downloading the workflow, you agree to our terms and conditions.