NodeRule Engine (Dictionary)

Predictor

Applies rules from a rules table to a data table. The rules follow the Rule Engine rules, though for PMML RuleSets stricter rules apply (no column reference in the outcome, cannot use regular expressions, 3-valued logic). If no rules match, the default value specified in the PMML tab is used, or missing when no default value was specified.
It takes a list of user-defined rules from the second input port (from the selected column(s)) and tries to match them to each row in the input table. If a rule matches, its outcome value is added into a new column. The first matching rule in order of definition determines the outcome.

Each rule is represented by a row, new line characters are replaced by spaces, even in string constants. To add comments, start a line in a (condition) cell with // (comments can not be placed in the same line as a rule). Anything after // will not be interpreted as a rule. Rules consist of a condition part (antecedent), which must evaluate to true or false, and an outcome (consequent, after the => symbol) which is put into the new column if the rule matches.

The outcome of a rule may be any of the following: a string (between quotes " or /), a number, a boolean constant, a reference to another column or the value of a flow variable value. The type of the outcome column is the common super type of all possible outcomes (including the rules that can never match). If no rule matches, the outcome is a missing value unless a default value is specified.

Columns are given by their name surrounded by $, numbers are given in the usual decimal representation. Note that strings must not contain (double-) quotes. Flow variables are represented by $${TypeCharacterAndFlowVarName}$$. (Column references are not supported for PMML outputs.) The TypeCharacter should be 'D' for double (real) values, 'I' for integer values and 'S' for strings.

The logical expressions can be grouped with parentheses. The precedence rules for them are the following: NOT binds most, AND , XOR and finally OR the least. Comparison operators always take precedence over logical connectives. All operators (and their names) are case-sensitive.

The ROWID represents the row key string, the ROWINDEX is the index of the row (first row has 0 value), while ROWCOUNT stands for the number of rows in the table. (These are not available for PMML.)

Some example rules (each should be in one row):

// This is a comment
$Col0$ > 0 => "Positive"
When the values in Col0 are greater than 0, we assign Positive to the result column value (if no previous rule matched).
$Col0$ = "Active" AND $Col1$ <= 5 => "Outlier"
You can combine conditions.
$Col0$ LIKE "Market Street*" AND 
    ($Col1$ IN ("married", "divorced") 
        OR $Col2$ > 40) => "Strange"
$Col0$ MATCHES $${SFlowVar0}$$ OR $$ROWINDEX$$ < $${IFlowVar1}$$ =>
    $Col0$
With parentheses you can combine multiple conditions. The result in the second case comes from one of the columns.
$Col0$ > 5 => $${SCol1}$$
The result can also come from a flow variable.

The following comparisons result true (other values are neither less, nor greater or equal to missing and NaN values):

  • ? =,<=,>= ?
  • NaN =,<=,>= NaN

Input Ports

  1. Port Type: Data
    Input data
  2. Port Type: Data
    Rules to apply

Output Ports

  1. Port Type: Data
    Table containing the computed column
  2. Port Type: PMML
    Possibly missing PMML port containing the rules in PMML RuleSet format