Spark Gradient Boosted Trees Learner

Node / Learner

Spark Gradient Boosted Trees Learner

Gradient Boosted Trees are ensembles of Decision Trees. Learning a Gradient Boosted Trees model means training a sequence of Decision Trees one-by-one, in order to minimize a loss function. This node uses the spark.ml Gradient Boosted Trees implementation to train a classification model in Spark, using a logistic loss function .

Note that only binary classification is supported. The target column must be nominal (with two distinct values), whereas the feature columns can be either nominal or numerical.

Use the Spark Predictor (Classification) node to apply the learned model to unseen data.

Please refer to the Spark documentation for a full description of the underlying algorithm.

This node requires at least Apache Spark 2.0.

Node details

Ports Options Views

Input ports

Type: Spark Data
Input data
Input Spark DataFrame with training data.

Output ports

Type: Table
Feature importance measures
Table with estimates of the importance of each feature. The features are listed in order of decreasing importance and are normalized to sum up to 1.
Type: Spark ML Model
Spark ML Gradient Boosted Trees model (classification)
Spark ML Gradient Boosted Trees model (classification)

Extension

The Spark Gradient Boosted Trees Learner node is part of this extension:

Go to item