This node uses the spark.ml Decision Tree implementation to train a Decision Tree classification model in Spark. The underlying algorithm performs a recursive binary partitioning of the feature space. Each split is chosen by selecting the best split from a set of possible splits, in order to maximize the information gain at a tree node. It supports binary and multiclass classification. The target column must be nominal, whereas the feature columns can be either nominal or numerical.
Use the Spark Predictor (Classification) node to apply the learned model to unseen data.
Please refer to the Spark documentation for a full description of the underlying algorithm.
This node requires at least Apache Spark 2.0.