Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • Nodes
  • Random Forest Learner
NodeNode / Learner

Random Forest Learner

Analytics Mining Decision Tree Ensemble Random Forest Classification
Drag & drop
Like
Copy short link

Learns a random forest*, which consists of a chosen number of decision trees. Each of the decision tree models is built with a different set of rows (records) and for each split within a tree a randomly chosen set of columns (describing attributes) is used. The row sets for each decision tree are created by bootstrapping and have the same size as the original input table. The attribute set for an individual split in a decision tree is determined by randomly selecting sqrt(m) attributes from the available attributes where m is the total number of learning columns. The attributes can also be provided as bit (fingerprint), byte, or double vector. The output model describes a random forest and is applied in the corresponding predictor node.

This node provides a subset of the functionality of the Tree Ensemble Learner corresponding to a random forest. If you need additional functionality please check out the Tree Ensemble Learner .

Experiments have shown the results on different datasets are very similar to the random forest implementation available in R .

The decision tree construction takes place in main memory (all data and all models are kept in memory).

The missing value handling corresponds to the method described here . The basic idea is that for each split to try to send the missing values in every possible direction; the one yielding the best results (i.e. largest gain) is then used. If no missing values are present during training, the direction of the split that the most records are following is chosen as the direction for missing values during testing.

Nominal columns are split in a binary manner. The determination of the split depends on the kind of problem:

  • For two-class classification problems the method described in section 9.4 of "Classification and Regression Trees" by Breiman et al. (1984) is used.
  • For multi-class classification problems the method described in "Partitioning Nominal Attributes in Decision Trees" by Coppersmith et al. (1999) is used.

(*) RANDOM FORESTS is a registered trademark of Minitab, LLC and is used with Minitab’s permission.

Node details

Input ports
  1. Type: Table
    Input Data
    The data to be learned from. They must contain at least one nominal target column and either a fingerprint (bit-vector/byte-vector) column or another numeric or nominal column.
Output ports
  1. Type: Table
    Out-of-bag Predictions
    The input data with the out-of-bag predictions, i.e. for each input row this is the majority vote of all models that did not use the row during their training. The appended columns are equivalent to the columns appended by the corresponding predictor node. There is one additional column model count , which contains the number of models used for the voting (number of models not using the row throughout learning.) The out-of-bag predictions can be used to get an estimate of the generalization error of the random forest by feeding them into the Scorer node.
  2. Type: Table
    Attribute Statistics
    A statistics table on the attributes used in the different trees. Each row represents one training attribute with these statistics: #splits (level x) as the number of models, which use the attribute as split on level x (with level 0 as root split); #candidates (level x) is the number of times an attribute was in the attribute sample for level x (in a random forest setup these samples differ from node to node). If no attribute sampling is used #candidates is the number of models. Note, these numbers are uncorrected, i.e. if an attribute is selected on level 0 but is also in the candidate set of level 1 (but will not be split on level 1 because it has been split one level up), the #candidate number will still count the attribute as candidate.
  3. Type: Tree Ensembles
    Random Forest Model
    The trained model.

Extension

The Random Forest Learner node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
    Integrated Deployment
    Machine learning Education Integrated deployment
    +1
    "Integrated Deployment" exercise for the advanced Life Science User Training - capture th…
    knime > Education > Courses > L2-LS KNIME Analytics Platform for Data Scientists - Life Sciences - Advanced > Exercises > 04.1. Integrated Deployment
    knime
  2. Go to item
    Integrated Deployment
    Machine learning Education Integrated deployment
    +1
    "Integrated Deployment" exercise for the advanced Life Science User Training - capture th…
    ioneliad > Public > ID_Workflow_Group > L2-LS KNIME Analytics Platform for Data Scientists - Life Sciences - Advanced > Exercises > 04.1. Integrated Deployment
    ioneliad
  3. Go to item
    Lab3.6
    doctorbrunson > Lab3.6 > Lab3.6
    doctorbrunson
  4. Go to item
    Just KNIME it 24
    Justknimeit-24
    berti093 > Public > Just KNIME it 24
    berti093
  5. Go to item
    Adv_Tree_Chap5_Random_Forest
    keith_mccormick > Adv Trees with KNIME > Adv_Tree_Chap5_Random_Forest
    keith_mccormick
  6. Go to item
    Hyperparameters Optimization and Training a Random Forest
    Machine learning KNIME Data science
    +2
    This workflow optimizes the hyperparameters of a random forest of decision trees and trai…
    knime > Examples > 06_Control_Structures > 04_Loops > 21_Parameter_optimization_loop > parameter_optimization_simple
    knime
  7. Go to item
    Parameter Optimization and Cross Validation
    Tree ensemble Classification Parameter optimization
    +4
    There has been no description set for this workflow's metadata.
    julian.bunzel > Public > Small examples > Parameter Optimization and Cross Validation
    julian.bunzel
  8. Go to item
    Prediction Service Consumer
    Workflow Service Random Forest
    This workflow calls a workflow service which applies a model to data. Data are generated …
    knime > Examples > 06_Control_Structures > 07_Workflow_Orchestration > 04_Call_Workflow_Service > 02_Prediction_Service_Consumer
    knime
  9. Go to item
    Machine Learning Chemistry
    Life Sciences Cheminformatics Machine Learning
    +1
    This workflow snippet demonstrates how to train a bioactivity model using chemical struct…
    knime > Workflow Snippets > Machine Learning Chemistry
    knime
  10. Go to item
    Prediction Service Consumer
    carlwitt > Public > Call Workflow > Prediction Service Consumer
    carlwitt
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits