Numeric Outliers


This node detects and treats the outliers for each of the selected columns individually by means of interquartile range (IQR).

To detect the outliers for a given column, the first and third quartile (Q1, Q3) is computed. An observation is flagged an outlier if it lies outside the range R = [Q1 - k(IQR), Q3 + k(IQR)] with IQR = Q3 - Q1 and k >= 0. Setting k = 1.5 the smallest value in R corresponds, typically, to the lower end of a boxplot's whisker and largest value to its upper end.
Providing grouping information allows to detect outliers only within their respective groups.

If an observation is flagged an outlier, one can either replace it by some other value or remove/retain the corresponding row.

Missing values contained in the data will be ignored, i.e., they will neither be used for the outlier computation nor will they be flagged as an outlier.

Input Ports

  1. Type: Data
    Numeric input data to evaluate + optional group information

Output Ports

  1. Type: Data
    Data table where outliers were either replaced or rows containing outliers/non-outliers were removed
  2. Type: Data
    Data table holding the number of members, i.e., non-missing values and outliers as well as the lower and upper bound for each outlier groups
  3. Type: Outlier
    Model holding the permitted interval bounds for each outlier group and the outlier treatment specifications


This node is part of the extension



Short Link

Drag node into KNIME Analytics Platform