Computes three variants of the inverse document frequency (idf) for each term according to the given set of documents and adds a column containing the idf value. Smooth, normalized, and probabilistic idf. The default variant is smooth idf specified as follows: idf(t) = log(1 + (f(D) / f(d, t))).
The normalized idf is defined by: idf(t) = log(f(D) / f(d,t)).
The probabilistic idf is defined by: idf(t) = log((f(D) - f(d,t)) / f(d,t)), where f(D) is the number of all documents and f(d,t) is the number of documents containing term t.

Input Ports

  1. Type: Data
    The input table which contains terms and documents.

Output Ports

  1. Type: Data
    The output table which contains terms documents and a corresponding frequency value.


This node is part of the extension

KNIME Textprocessing


Short Link

Drag node into KNIME Analytics Platform