This node computes the Silhouette Coefficient for the provided clustering result. The Silhouette Coefficient is a useful metric for evaluating clustering performance. For each row, it is computed using (b - a) / max(a, b) , where a is the mean intra-cluster distance and b is the mean inter-cluster distance to the closest cluster. Additionally, a second table containing the mean over all individual Silhouette Coefficients is calculated. The score can range from -1.0 to 1.0, while the higher the score, the better. There have to be at least two clusters for the score to be computable.
By default, the Euclidean distance is used to calculate distances between rows. This may be changed by providing an optional distance function. If a distance function is supplied, the data column selection in the dialog will be ignored as the used columns are configured by the connected distance function.
Computing the Silhouette Coefficient is computationally expensive, thus it is recommended to subsample if the original dataset is large.