Spark k-Means


This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.

Input Ports

  1. Type: Spark Data Input data (JavaRDD)

Output Ports

  1. Type: Spark Data The input data labeled with the cluster they are contained in.
  2. Type: Spark Model MLlib Cluster Model

Find here

Tools & Services > Apache Spark > Mining > Clustering

Make sure to have this extension installed:

KNIME Extension for Apache Spark

Update site for KNIME Analytics Platform 3.7:
KNIME Analytics Platform 3.7 Update Site

How to install extensions