Hub
Pricing About
NodeNode / Learner

Spark k-Means

Tools & ServicesApache SparkMiningClustering
Drag & drop
Like

This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.

Node details

Input ports
  1. Type: Spark Data
    JavaRDD
    Input data (JavaRDD)
Output ports
  1. Type: Spark Data
    Labeled input
    The input data labeled with the cluster they are contained in.
  2. Type: Spark MLlib Model
    MLlib Cluster Model
    MLlib Cluster Model

Extension

The Spark k-Means node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits