Hub
Pricing About
WorkflowWorkflow

Importing, preprocessing, and clustering of textual data

NLPNatural Language ProcessingClustering
knime profile image
Draft Latest edits on 
May 5, 2020 12:20 PM
Drag & drop
Like
Download workflow
Workflow preview
The goal of this workflow is to cluster a set of newsgroup documents into their corresponding topic. The data is taken from the 20 newsgroups dataset. The workflow starts with a data table containing some newsgroup documents, divided into two categories, politics.guns and sport.baseball. First, the data are converted into documents, whose category is the class politics or sport. The documents are then preprocessed by filtering and lemmatizing. After that, the documents are transformed into a bag of words, which is filtered again. Only terms that occur at least in 1% of the documents (at least in 2 documents) will be used as features and not be filtered out. Then the documents are transformed into document vectors. The document vectors are a numerical representation of documents and are in the following used for hierarchical clustering based on Manhattan, Euclidean, and Cosine distance measures.

External resources

  • Newsgroup dataset
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.1.0 Note: Not all extensions may be displayed.
  • Go to item
    KNIME CoreTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    knime profile image
    knime
  • Go to item
    KNIME TextprocessingTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits