The TextClassifierModelPruner allows to reduce the size of a text classification model by applying different pruning methods. On the one hand, low frequency terms which were encountered during training can be removed. Our experience shows, that setting this value e.g. to "2" roughly reduces the number of terms in the model to half the amount without significantly harming classification quality (your mileage may vary).

On the other hand, an information-gain-based pruning strategy is available, which scores the terms and their associated category probabilities. A good explanation of the information gain method can be found in "A Comparative Study on Feature Selection in Text Categorization", Yiming Yang and Jan O. Pedersen, 1997.

Input Ports

  1. Type: Text classifier The model data of the trained classifier.

Output Ports

  1. Type: Text classifier The pruned model, where terms not satisfying the given properties have been removed.

Find here

Community Nodes > Palladian > Text Classifier

Make sure to have this extension installed:

Palladian for KNIME

Update site for KNIME Analytics Platform 3.7:
KNIME Community Contributions (3.7)

How to install extensions