Simple Regression Tree Learner

Node / Learner

Simple Regression Tree Learner

Learns a single regression tree. The procedure follows the algorithm described by "Classification and Regression Trees" (Breiman et al, 1984), whereby the current implementation applies a couple of simplifications, e.g. no pruning, not necessarily binary trees, etc.

In a regression tree the predicted value for a leaf node is the mean target value of the records within the leaf. Hence the predictions are best (with respect to the training data) if the variance of target values within a leaf is minimal. This is achieved by splits that minimize the sum of squared errors in their respective children.

The currently used missing value handling also differs from the one used by Breiman et al, 1984. In each split the algorithm tries to find the best direction for missing values by sending them in each direction and selecting the one that yields the best result (i.e. largest gain). The procedure is adapted from the well known XGBoost algorithm and is described here .

Node details

Ports Options Views

Input ports

Type: Table
Input Data
The data to learn from. It must contain at least one numeric target column and either a fingerprint (bit/byte/double-vector) column or another numeric or nominal column.

Output ports

Type: Regression Tree
Regression Tree Model
The trained model.

Extension

The Simple Regression Tree Learner node is part of this extension:

Go to item