Spark Decision Tree Learner (Regression)

Node / Learner

Spark Decision Tree Learner (Regression)

This node uses the spark.ml implementation to train a regression model in Spark. The underlying algorithm performs a recursive binary partitioning of the feature space. Each split is chosen by selecting the best split from a set of possible splits, in order to maximize the information gain at a tree node. Information gain is calculated with a variance-based quality measure. The target column must be numerical, whereas the feature columns can be either nominal or numerical.

Use the Spark Predictor (Regression) node to apply the learned model to unseen data.

Please refer to the Spark documentation for a full description of the underlying algorithm.

This node requires at least Apache Spark 2.0.

Node details

Ports Options Views

Input ports

Type: Spark Data
Input data
Input Spark DataFrame with training data.

Output ports

Type: Table
Feature importance measures
Table with estimates of the importance of each feature. The features are listed in order of decreasing importance and are normalized to sum up to 1. Note that feature importances for single s can have high variance due to correlated predictor variables. Consider using the Spark Random Forest Learner to determine feature importance instead.
Type: Spark ML Model
Spark ML Decision Tree model (regression)
Spark ML Decision Tree model (regression)

Extension

The Spark Decision Tree Learner (Regression) node is part of this extension:

Go to item