This workflow uses the Giskard Scanner node to scan a machine-learning workflow for common weaknesses.
The idea of the Giskard Scanner node is to allow fine-tuning of your ML model while developing it in KNIME Analytics Platform. Hence, the example mimics a data scientist's different attempts and modifications to their models and workflows before putting them into production.
It is also possible to use Giskard after the model is deployed in production. Workflow monitoring using Giskard allows the data scientist to be aware of issues with production data flowing into the model, such as data shift phenomena.
You can download and run the workflow directly in your KNIME Analytics Platform. We recommend using the latest version of KNIME AP for optimal performance.
The goal of the model is to predict the price of different vehicle models.
Workflow Steps
Reading the Data
The data is embedded into the workflow, so you can just execute the node.
Partitioning into Train and Test
The partitioning node splits the data into 70% training and 30% test data, which is used later to evaluate the model with Giskard.
Fine-Tuning ML Models Attempts
The workflow represents two moments of fine-tuning ML models to make clear how to use Giskard while developing your ML workflow in KNIME.
Attempt 1
Model Training
Train a Gradient Boost Trees regression model on the dataset using Price as the target column and all the predictors.
Capture Inference Workflow
The inference workflow is surrounded by the Capture Workflow Start and End nodes that extract the contained workflow snippet into a dedicated workflow port object.
The inference workflow uses the model predictor to create the classification output. In addition to the predicted class, the predictor outputs the class probabilities, which Giskard uses.
Evaluate Inference Workflow with Giskard
The Giskard Scanner node checks the inference workflow for common weaknesses in robustness, spurious correlation, performance, and overconfidence categories. Please refer to the workflow annotations for further details about this attempt's findings.
Final Attempt
Model Training
Train a Gradient Boost Trees regression model on the dataset using Price as the target column. Based on previous attempt learning, we exclude the variables model, condition, and color.
Capture Inference Workflow
The captured workflow part now includes some preprocessing operations to handle perturbation in the nominal predictors. The inference workflow applies the preprocessing operations followed by the model predictor to create the classification output. In addition to the predicted class, the predictor also outputs the class probabilities, which are used by Giskard.
Evaluate Inference Workflow with Giskard
Please refer to the workflow annotations for further details.
This workflow demonstrates how to fine-tune and evaluate ML models using the Giskard Scanner node in KNIME, ensuring robust and reliable models before deployment.