Binary Classification Inspector
This node produces a complex view made of four different charts in order to compare, optimize and select predictions of different binary classifiers:
- Compare a number of binary classifier machine learning models predicting the same target on the same test data using performance metrics and ROC curves
- Optimize a model by finding the best threshold given a performance metric of your choice
- Interactively select a given type of predictions (e.g. true positives) of one of the models and export them at the output of the node
The user journey when using this view follows these steps:
- Compare model AUC to select the best model via the Model's Statistics bar chart and the Model's ROC Curves chart in the top panel;
- Change the threshold from it's initial value either manually, via the Threshold Slider , or automatically maximizing one of the available function (e.g. F-Measure) via the Model Tab Dropdown.
- Look back at the top panel to see how the new threshold impacts the model when compared with the other models.
- Inspect the Confusion Matrix in the bottom panel to assess the gravity of the misclassification, give the associated probability confidence of the model on the Classification Distribution chart.
- Combine this view with other KNIME views in a Component to interactively visualize different types of visualizations (e.g. false positives) with interactive selection events.
- After interaction use the "Apply" features to export the new threshold and selected model from the node as flow variables. This will also export: A) selected model predictions B) selection of the confusion matrix cell C) selected model performance statistics.
The node supports custom CSS styling. You can simply put CSS rules into a single string and set it as a flow variable 'customCSS' in the node configuration dialog. You will find the list of available classes and their description on our documentation page.
- Type: Data Data table with a column for the ground truth (binary/nominal/class/0-1/etc.) and a column for each model containing the positive prediction probability of that model.
- Type: Data A table with one COLORED row for each model column (NOT ground truth) with a single column containing the name of the column.
- Type: Data The data table at this out-port is derived from the data input at in-port #0 and can have a number of options. It will always contain:
- The ground truth column selected in the dialog.
- The chosen model-prediction columns chosen in the dialog.
- A boolean "selected" column with a value of "true" or "false" depending on whether or not the row was selected in the confusion matrix in the view. The value will be "missing" if no selections were made or the view was not opened.
Optionally, the data table may also contain:
- A String-value column for EACH of the model-prediction columns chosen in the dialog. Each column will contain the label (either positive class or negative class) depending on the threshold for that model. The thresholds will either be taken from the dialog "Threshold Method" option OR be taken from the most recent value saved from the view for that model. To see which threshold was used to label each column, reference the row of statistics relating to each model in the 2nd out-port. To enable this option select the "Append new predictions for all models" option in the dialog.
- A single String-value column for the selected model in the view. This column will contain the label (either positive class or negative class) depending on the threshold for the selected model in the view. This column will contain "missing values" if no model was selected in the view. The name of this column can also be set in the dialog to assist in downstream workflow processes. The selected model column name and threshold will also be output as flow variables from this node.
To enable this option select the "Append prediction column for the selected model" option in the dialog.
The final option for this table is to exclude tables that are not the ground truth column or the selected model-prediction column(s). This will filter out any unwanted columns which may have been left over from upstream predictors. This option is NOT enabled by default. In order to take advantage of this option DESELECT the "Retain non-prediction columns in output table" option in the dialog.
- Type: Data Table containing a single row of statistics for each of the models chosen. This tables also contains a column corresponding to the threshold value used in the calculation of the statistics in this table and the (potential) labeling of the output data.