This workflow uses the Giskard Scanner node to scan a machine learning workflow for common weaknesses.
You can download and run the workflow directly in your KNIME Analytics Platform. For optimal performance, we recommend using the latest version of KNIME AP.
Workflow steps
Reading the data
The data is embedded into the workflow so you can just execute the node.
Partition into train and test
The partitioning node is used to split the data into 70% training data and 30% test data that is used later to evaluate the model with Giskard.
Preprocessing
Remove missing values by replacing missing strings with '?' and using the mean for numerical columns.
Z-score normalize numerical columns for more stable logistic regression results.
Model training
Train a logistic regression model on the dataset using the non-ID columns as features and the survived column as target.
Capture inference workflow
The inference workflow is surrounded by the Capture Workflow Start and End nodes that extract the contained workflow snippet into a dedicated workflow port object.
The inference workflow applies the preprocessing operations followed by the model predictor to create the classification output. Besides the predicted class, the predictor also outputs the class probabilities which are used by Giskard.
Evalute inference workflow with Giskard
The Giskard Scanner node checks the inference workflow for common weaknesses in the categories robustness, spurious correlation, performance and overconfidence.
Data used in this workflow:
Titanic - Machine Learning from Disaster. Retrieved from https://www.kaggle.com/c/titanic