A random forest* is an ensemble of decision trees. Learning a random forest model means training a set of independent decision trees in parallel. This node uses the spark.ml random forest implementation to train a classification model in Spark. The target column must be nominal, whereas the feature columns can be either nominal or numerical.
Use the Spark Predictor (Classification) node to apply the learned model to unseen data.
Please refer to the Spark documentation for a full description of the underlying algorithm.
This node requires at least Apache Spark 2.0.
(*) RANDOM FORESTS is a registered trademark of Minitab, LLC and is used with Minitab’s permission.