Computes three variants of the inverse document frequency (idf) for each term according to the given set of documents and adds a column containing the idf value. Smooth, normalized, and probabilistic idf. The default variant is smooth idf specified as follows: idf(t) = log(1 + (f(D) / f(d, t))).
The normalized idf is defined by: idf(t) = log(f(D) / f(d,t)).
The probabilistic idf is defined by: idf(t) = log((f(D) - f(d,t)) / f(d,t)), where f(D) is the number of all documents and f(d,t) is the number of documents containing term t.
- Type: Data The input table which contains terms and documents.
- Type: Data The output table which contains terms documents and a corresponding frequency value.
Other Data Types > Text Processing > Frequencies
Make sure to have this extension installed:
Update site for KNIME Analytics Platform 3.7:
KNIME Analytics Platform 3.7 Update Site