Classification Model Evaluation using Giskard

This workflow uses the Giskard Scanner node to scan a machine-learning workflow for common weaknesses.

The idea of the Giskard Scanner node is to allow fine-tuning of your ML model while developing it in KNIME Analytics Platform. Hence, the example mimics a data scientist's different attempts and modifications to their models and workflows before putting them into production.

Additionally, it is also possible to use Giskard after the model is deployed in production. Workflow monitoring using Giskard allows the data scientist to be aware of issues with production data flowing into the model, such as data shift phenomena.

You can download and run the workflow directly in your KNIME Analytics Platform. We recommend using the latest version of KNIME AP for optimal performance.

The goal of the model is to predict the grade class of a group of students based on different features.

Workflow Steps

Reading the Data
- The data is embedded into the workflow, so you can just execute the node.
Partitioning into Train and Test
- The partitioning node splits the data into 70% training data and 30% test data, which is used later to evaluate the model with Giskard.
Various Fine-Tuning ML Models Attempts
- The workflow represents various moments of fine-tuning ML models to make clear how to use Giskard while developing your ML workflow in KNIME.

Attempt 1

Model Training
- Train a Random Forest classification model on the dataset using GradeClass as the target column.
Capture Inference Workflow
- The inference workflow is surrounded by the Capture Workflow Start and End nodes that extract the contained workflow snippet into a dedicated workflow port object.
- The inference workflow uses the model predictor to create the classification output. Besides the predicted class, the predictor also outputs the class probabilities, which are used by Giskard.
Evaluate Inference Workflow with Giskard
- The Giskard Scanner node checks the inference workflow for common weaknesses in robustness, spurious correlation, performance, and overconfidence categories. Please refer to the workflow annotations for further details.

Attempt 2

Model Training
- Train a Random Forest classification model on the dataset using GradeClass as the target column. Based on previous learnings, remove Absence and Include GPA variables.
Capture Inference Workflow
- The same as in attempt 1.
Evaluate Inference Workflow with Giskard
- Please refer to the workflow annotations for further details.

Final Attempt

Model Training
- Train a Random Forest classification model on the dataset using GradeClass as the target column. Based on previous learnings, remove the GPA variable.
Capture Inference Workflow
- The captured workflow part includes some preprocessing operations, binning, and outliers handling. The inference workflow applies the preprocessing operations followed by the model predictor to create the classification output. In addition to the predicted class, the predictor also outputs the class probabilities, which are used by Giskard.
Evaluate Inference Workflow with Giskard
- Please refer to the workflow annotations for further details.

This workflow demonstrates how to fine-tune and evaluate ML models using the Giskard Scanner node in KNIME, ensuring robust and reliable models before deployment.

Classification Model Evaluation using Giskard

External resources

KNIME Base nodes

KNIME Ensemble Learning Wrappers

KNIME Giskard Extension

KNIME Integrated Deployment

KNIME Statistics Nodes

KNIME Views

Legal

Classification Model Evaluation using Giskard

External resources

Used extensions & nodes

KNIME Base nodesTrusted extension

KNIME Ensemble Learning WrappersTrusted extension

KNIME Giskard ExtensionTrusted extension

KNIME Integrated DeploymentTrusted extension

KNIME Statistics NodesTrusted extension

KNIME ViewsTrusted extension

Legal

KNIME Base nodes

KNIME Ensemble Learning Wrappers

KNIME Giskard Extension

KNIME Integrated Deployment

KNIME Statistics Nodes

KNIME Views