This component runs K-means algorithm and outputs the Euclidean distance between every point and the clusters' centroids.
In the configuration dialog, you can select whether to calculate the distance to every cluster's centroid or only to the one each points belongs to.
The clustering algorithm uses the Euclidean distance on the selected attributes. The data is not normalized by the node (if required, you should consider to use the "Normalizer" as a preprocessing step).
This component is free to use and modify.
Author: Andrea De Mauro, aboutbigdata.net
- Type: TableInputInput data for the clustering. Only numeric columns are considered in the clustering.