This component generates an interactive visualization to help the user understand their model’s behavior on a single example data point. It works in two main steps.
COUNTERFACTUALS
1) The component synthetically oversamples the example dataset (data input 1) by shuffling features and deploying the SMOTE algorithm after that. This expanded dataset is then searched for nearby data points that when scored by the input model output the classification selected in the component configuration dialog. We call these data points that generate the desired classification Counterfactuals.
*Nearby data points in this instance means as little numeric variation as possible. For example we might call <1,1> and <1,1.1> close but <1,1> and <17,23> distant.
FEATURE IMPORTANCE FROM GLM
2) Next we define a neighborhood around the original data point. This might mean, for example, all data points with less than 0.5 numeric variation from the original point. Specifically we use the Manhattan Distance on the normalized data points. We use the smallest neighborhood we can that includes examples of the desired class from the configuration dialog. On this set of data points we train a Surrogate GLM to mimic the input model. From this model we extract and normalize the coefficients and display these in a bar chart as a local feature importance measure.
* The Manhattan Distance between two vectors is the sum of the differences between each element. For example the Manhattan Distance between <1,2> and <1.5,1> is:
|1.5-1| + |1-2| = 0.5 + 1 = 1.5
HOW TO USE
1) Drag and drop the component into your workflow
2) To the first input port connect a model captured by the integrated deployment framework, such as a model from the AutoML component
3) To the second input port connect a sample set of data points. These will be used to generate artificial data points to be used as counterfactual candidates.
4) To the third input port connect a table containing one row, the data point around which you want to generate model explanations and generate counterfactuals for.
5) In the configuration dialog select the target column and the class desired in the counterfactuals, this must be different from the value in the data point from input 3.
6) Feel free to tune the other configuration parameters. Higher oversampling and permutation rates will reduce sparsity in the data but increase processing time. If these settings are raised or if you are using a large sample set in input 2 it is recommended to increase the expansion rate parameter.
- Type: Workflow Port ObjectIntegrated Deployment PortWorkflow port from model captured with integrated deployment functionality. Such as from AutoML component.
- Type: TableSample DataExtra labeled samples. Must include some examples of the desired class.
- Type: TableRecord to ExplainIndividual data point that will be explained and act as the center point of the neighborhood.