Categorizes values in a column according to a dictionary table with min/max values. The table at the first input contains a column with values to be categorized. The second table contains a column with lower bound values, a column with upper bound values and a column with label values. The latter will be used as outcome in case a given value is between the corresponding lower and upper bound. Each row in the second table represents a rule, whereby the rules are evaluated top-down, i.e. rules with low row index have higher priority than rules in the subsequent rows.
Either the lower or upper bound test can be disabled by unsetting the corresponding checkbox in the dialog. Missing values in the columns containing upper and lower bounds will always evaluate the bound check to true. That is, a missing value in the lower bound column will always be smaller and a missing value in the upper bound column will always be larger than the value. Missing values in the value column (1st input) will result in a missing cell output (no categorization).
Note: The table containing bound and label information (2nd input) will be read into memory during execution; it must be a relatively small table!
- Type: Data Arbitrary input data with column to be binned.
- Type: Data Table containing categorization rules with lower and upper bound and the label column.
- Type: Data Input table amended by column with categorization values.
Manipulation > Column > Binning
Make sure to have this extension installed: