This node uses the spark.ml implementation to train a regression model in Spark. The underlying algorithm performs a recursive binary partitioning of the feature space. Each split is chosen by selecting the best split from a set of possible splits, in order to maximize the information gain at a tree node. Information gain is calculated with a variance-based quality measure. The target column must be numerical, whereas the feature columns can be either nominal or numerical.
Use the Spark Predictor (Regression) node to apply the learned model to unseen data.
Please refer to the Spark documentation for a full description of the underlying algorithm.
This node requires at least Apache Spark 2.0.