This workflow shows how to use cross-validation in H2O using the KNIME H2O Nodes. In the example we use the H2O Random Forest to predict the multiclass response of the IRIS data set using 5-folds and evaluate the cross-validated performance.
Importing the IRIS data to H2O.
2. Cross Validation:
In order to do Cross Validation using the KNIME H2O Nodes, we use the "H2O Cross Validation Loop Start" Node and configure it for 5-fold Cross Validation using stratified fold assignment. The upper output Port contains the training data and the lower output port the test data.
3. Learn Models in Cross Validation Loop:
For each CV-fold, a Random Forest with 50 trees of maximum depth 15 is build by H2O using the training data of the corresponding fold. The test data of the fold is then predicted, adding the class specific probabilities of class membership (needed for multinominal scoring) and scored by the H2O Multinominal Scorer Node.
To evaluate the overall performance of all trained random forests, we use the "GroupBy" Node to compute the average performance like Accuracy, LogLoss, and more.