This Component can be used before the bottom input port of SHAP Loop Start. This technique will use k-means to summarize the validation set and create a sampling table to use when creating coalitions.
The created sampling table is large n rows, each row is a different prototype of the data. This n can be adjusted from the configuration dialogue of the Component. The n default value is 100.
The output sampling table has, for each of the n clusters created by k-means, a prototype row and a column "SHAP Summarizer Sampling weight" that can be used by the SHAP Loop Start node.
This Component can summarize data of the following domains: Number (integer), Number (double) and String.
DISCLAIMER : the Component statistical sampling is not always guaranteed when you provide String columns in the input table. Current computer science research is still looking for a more solid solution than training k-means via one-hot encoding-decoding of categorical columns.
- Type: TableValidation DataData to be summarized containing all the features SHAP Loop Start needs. Supported domains: Numeric (double), Numeric (integer), String.