This workflow uses the Giskard Scanner node to scan a machine-learning workflow for common weaknesses.
The idea of the Giskard Scanner node is to allow fine-tuning of your ML model while developing it in KNIME Analytics Platform. Hence, the example mimics a data scientist's different attempts and modifications to their models and workflows before putting them into production.
Additionally, it is also possible to use Giskard after the model is deployed in production. Workflow monitoring using Giskard allows the data scientist to be aware of issues with production data flowing into the model, such as data shift phenomena.
You can download and run the workflow directly in your KNIME Analytics Platform. We recommend using the latest version of KNIME AP for optimal performance.
The goal of the model is to predict the grade class of a group of students based on different features.
Workflow Steps
Reading the Data
The data is embedded into the workflow, so you can just execute the node.
Partitioning into Train and Test
The partitioning node splits the data into 70% training data and 30% test data, which is used later to evaluate the model with Giskard.
Various Fine-Tuning ML Models Attempts
The workflow represents various moments of fine-tuning ML models to make clear how to use Giskard while developing your ML workflow in KNIME.
Attempt 1
Model Training
Train a Random Forest classification model on the dataset using GradeClass as the target column.
Capture Inference Workflow
The inference workflow is surrounded by the Capture Workflow Start and End nodes that extract the contained workflow snippet into a dedicated workflow port object.
The inference workflow uses the model predictor to create the classification output. Besides the predicted class, the predictor also outputs the class probabilities, which are used by Giskard.
Evaluate Inference Workflow with Giskard
The Giskard Scanner node checks the inference workflow for common weaknesses in robustness, spurious correlation, performance, and overconfidence categories. Please refer to the workflow annotations for further details.
Attempt 2
Model Training
Train a Random Forest classification model on the dataset using GradeClass as the target column. Based on previous learnings, remove Absence and Include GPA variables.
Capture Inference Workflow
The same as in attempt 1.
Evaluate Inference Workflow with Giskard
Please refer to the workflow annotations for further details.
Final Attempt
Model Training
Train a Random Forest classification model on the dataset using GradeClass as the target column. Based on previous learnings, remove the GPA variable.
Capture Inference Workflow
The captured workflow part includes some preprocessing operations, binning, and outliers handling. The inference workflow applies the preprocessing operations followed by the model predictor to create the classification output. In addition to the predicted class, the predictor also outputs the class probabilities, which are used by Giskard.
Evaluate Inference Workflow with Giskard
Please refer to the workflow annotations for further details.
This workflow demonstrates how to fine-tune and evaluate ML models using the Giskard Scanner node in KNIME, ensuring robust and reliable models before deployment.