Learns a random forest*, which consists of a chosen number of decision trees. Each of the decision tree models is learned on a different set of rows (records) and a different set of columns (describing attributes), whereby the latter can also be a bit-vector or byte-vector descriptor (e.g. molecular fingerprint). The row sets for each decision tree are created by bootstrapping and have the same size as the original input table. For each node of a decision tree a new set of attributes is determined by taking a random sample of size sqrt(m) where m is the total number of attributes. The output model describes a random forest and is applied in the corresponding predictor node using a simply majority vote.
This node provides a subset of the functionality of the Tree Ensemble Learner corresponding to a random forest. If you need additional functionality please check out the Tree Ensemble Learner .
Experiments have shown the results on different data sets are very similar to the random Forest implementation available in R . Known differences are in the missing value handling (currently not available in this node) and the split creation for nominal attributes (the original random forest classifier uses binary nominal splits; this implementation creates child nodes for each possible split attribute).
The decision tree construction takes place in main memory (all data and all models are kept in memory).
(*) RANDOM FORESTS is a registered trademark of Minitab, LLC and is used with Minitab’s permission.