SHAP Loop Start

SHAP is an acronym for SHapley Additive exPlanations and represents a unified approach to explain the predictions of any machine learning model. For a single output (e.g. probability of the positive class in a binary classification) it assigns to each feature a so-called Shapley Value that quantifies how this particular feature changed the output. If you have multiple outputs, multiple such Shapley Value sets are calculated. The sum of all Shapley Values for a single output adds up to the deviation from the mean prediction (aka null prediction), which is the prediction the model would have made if no feature had been available. KNIME Analytics Platform also offers a second means to calculate Shapley Values via the Shapley Values loop nodes. In contrast to these, SHAP allows you to also find sparse explanations via regularization with the LASSO. The advantage of this is that you can pick the maximal number of features you want to have in your explanation, which makes the explanations far more understandable in cases with hundreds or thousands of features. If a maximal number of features is specified, SHAP will find for each explainable row those features that have the most impact on its prediction and then only consider those when calculating the Shapley Values.

Usage

The first input table of this node contains the rows of interest (ROI), for which an explanation is required. The SHAP algorithm replaces certain subsets of features of a ROI and observes how the model output changes. These replacement features are taken from the second input table. Note that in contrast to the Shapley Values and LIME loops, this sampling table should not be much larger than 100 rows so as to keep the runtime reasonable (don't worry, SHAP is usually still on par with the other methods). The output of the SHAP Loop Start node contains only those columns specified as feature columns in the dialog. This table has to be predicted by the model, whose predictions you want to better understand, and then fed into the SHAP Loop End node to calculate the explanations. Note that the SHAP loop has n + 1 iterations where n is the number of ROIs (rows of the first input table). The first iteration is special as it doesn't explain a ROI like the other iterations but is used to estimate the mean prediction by letting the model predict the sampling table (second input of the SHAP Loop Start). In the loop body you should use your model to predict the data produced by the SHAP Loop Start node and feed the table containing the appended predictions into the SHAP Loop End node. Note that SHAP can only explain numerical predictions, so you have to configure your predictor to output probabilities in case of a classification task.

Node details

Ports Options Views

Input ports

Type: Table
Table containing the rows to explain
Table containing the rows to be explained.
Type: Table
Sampling data
Table containing rows for sampling.

Output ports

Type: Table
Predictable table
This table contains rows that have to be predicted by the predictor node corresponding to the model whose predictions you want to explain.

Extension

The SHAP Loop Start node is part of this extension:

Go to item