Hub
Pricing About
WorkflowWorkflow

Fuzzy Category Cleaner - Preparing Categorical Data for Machine Learning

Data-cleaningFuzzy-matchingApproximate-matchingCategory-cleaningMachine-learning-preprocessing
+2
exorbyte-Team profile image
Draft Latest edits onย 
Apr 13, 2026 2:27 PM
Drag & drop
Like
Download workflow
Workflow preview

🧹 Cleaning Noisy Categories for ML


This workflow demonstrates how to clean categorical labels before training a machine learning model.

Real-world datasets often contain inconsistent or misspelled category values (e.g., Logiystics, Eduzcation, Healthcar). If used directly, these noisy labels fragment the data and reduce model accuracy.

🔑 Steps in this workflow:

  1. 📂 Load Product Sales Data โ€“ dataset with features: Units Sold, Purchase Probability, Sales Channel, and noisy Category.

  2. 🏷๏ธ Reference Category Labels โ€“ define the valid set of canonical categories (Electronics, Logistics, Education, Healthcare, Finance).

  3. 🔍 Approximate String Matcher โ€“ apply Levenshtein distance to align noisy category values with their closest valid label.

โœ… Result: A cleaned dataset where all category labels are consistent and ML-ready.

External resources

  • exorbyte GmbH
Loading deploymentsLoading manual runs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.8.2
  • Go to item
    exorbyte matchmaker toolboxTrusted extension

    exorbyte GmbH

    Version 1.2.4

    exorbyte-Team profile image
    exorbyte-Team
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.8.2

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
ยฉ 2026 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits