This workflow uses the Giskard Scanner node to scan a machine learning workflow for common weaknesses.

You can download and run the workflow directly in your KNIME Analytics Platform. For optimal performance, we recommend using the latest version of KNIME AP.

Workflow steps

Reading the data
- The data is embedded into the workflow so you can just execute the node.
Partition into train and test
- The partitioning node is used to split the data into 70% training data and 30% test data that is used later to evaluate the model with Giskard.
Preprocessing
- Remove missing values by replacing missing strings with '?' and using the mean for numerical columns.
- Z-score normalize numerical columns for more stable logistic regression results.
Model training
- Train a logistic regression model on the dataset using the non-ID columns as features and the survived column as target.
Capture inference workflow
- The inference workflow is surrounded by the Capture Workflow Start and End nodes that extract the contained workflow snippet into a dedicated workflow port object.
- The inference workflow applies the preprocessing operations followed by the model predictor to create the classification output. Besides the predicted class, the predictor also outputs the class probabilities which are used by Giskard.
Evalute inference workflow with Giskard
- The Giskard Scanner node checks the inference workflow for common weaknesses in the categories robustness, spurious correlation, performance and overconfidence.

Data used in this workflow:

Titanic - Machine Learning from Disaster. Retrieved from https://www.kaggle.com/c/titanic