Learns a single regression tree. The procedure follows the algorithm described by "Classification and Regression Trees" (Breiman et al, 1984), whereby the current implementation applies a couple of simplifications, e.g. no pruning, not necessarily binary trees, etc.

In a regression tree the predicted value for a leaf node is the mean target value of the records within the leaf. Hence the predictions are best (with respect to the training data) if the variance of target values within a leaf is minimal. This is achieved by splits that minimize the sum of squared errors in their respective children.

The currently used missing value handling also differs from the one used by Breiman et al, 1984. In each split the algorithm tries to find the best direction for missing values by sending them in each direction and selecting the one that yields the best result (i.e. largest gain). The procedure is adapted from the well known XGBoost algorithm and is described here .