SparseGenerativeModel (3.7)

Node / Learner

SparseGenerativeModel (3.7)

Analytics Integrations Weka Weka (3.7)Classification Algorithms

Generative models for scalable text mining

Provides sparse matrix implementations for Multinomial Naive Bayes and Multinomial Kernel Density classifiers.Uses hash tables for training, inverted indices for classification.

Implements several options for smoothing, TF-IDF feature transforms, parameter pruning and model-based feedback.Default options smooth class-conditionals using Uniform Jelinek-Mercer and instance-conditionals with Pitman-Yor Process smoothing.

Restart Weka after package installation.

For preprocessing use Weka StringToWordVector with outputWordCounts=True.

Documentation website: sourceforge.net/p/sgmweka/wiki

References:Puurula, A.

Scalable Text Classification with Sparse Generative Modeling.Proceedings of the 12th Pacific Rim International Conference on Artificial Intelligence.

2012.Puurula, A and Bifet, A.Ensembles of Sparse Multinomial Classifiers for Scalable Text Classification.ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification.2012.

Puurula, A and Myaeng, S.Integrated Instance- and Class-based Generative Modeling for Text Classification.Proceedings of the Australasian Document Computing Symposium.2013

(based on WEKA 3.7)

For further options, click the 'More' - button in the dialog.

All weka dialogs have a panel where you can specify classifier-specific parameters.

Node details

Ports Options Views

Input ports

Type: Table
Training data
Training data

Output ports

Type: Weka 3.7 Classifier
Trained model
Trained model

Extension

The SparseGenerativeModel (3.7) node is part of this extension:

Go to item