Generative models for scalable text mining
Provides sparse matrix implementations for Multinomial Naive Bayes and Multinomial Kernel Density classifiers.Uses hash tables for training, inverted indices for classification.
Implements several options for smoothing, TF-IDF feature transforms, parameter pruning and model-based feedback.Default options smooth class-conditionals using Uniform Jelinek-Mercer and instance-conditionals with Pitman-Yor Process smoothing.
Restart Weka after package installation.
For preprocessing use Weka StringToWordVector with outputWordCounts=True.
Documentation website: sourceforge.net/p/sgmweka/wiki
References:Puurula, A.
Scalable Text Classification with Sparse Generative Modeling.Proceedings of the 12th Pacific Rim International Conference on Artificial Intelligence.
2012.Puurula, A and Bifet, A.Ensembles of Sparse Multinomial Classifiers for Scalable Text Classification.ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification.2012.
Puurula, A and Myaeng, S.Integrated Instance- and Class-based Generative Modeling for Text Classification.Proceedings of the Australasian Document Computing Symposium.2013
(based on WEKA 3.7)
For further options, click the 'More' - button in the dialog.
All weka dialogs have a panel where you can specify classifier-specific parameters.