Spark k-Means

Node / Learner

Spark k-Means

This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.

Node details

Ports Options Views

Input ports

Type: Spark Data
JavaRDD
Input data (JavaRDD)

Output ports

Type: Spark Data
Labeled input
The input data labeled with the cluster they are contained in.
Type: Spark MLlib Model
MLlib Cluster Model
MLlib Cluster Model

Extension

The Spark k-Means node is part of this extension:

Go to item

Spark k-Means

Node details

Input ports

Output ports

Extension

Related workflows & nodes