Hub
Pricing About
WorkflowWorkflow

Document Classification: Model Training and Deployment

NLPNatural Language ProcessingText ClassificationPracticing Data Science
knime profile image
Versionv1.0Latest, created on 
Oct 20, 2023 2:07 PM
Drag & drop
Like
Download workflow
Workflow preview
The goal of this workflow is to do spam classification using YouTube comments as the dataset. The workflow starts with a data table containing some YouTube comments taken from the YouTube Spam Collection Data Set at the UCI ML Repository[1] . The data is available in the workflow directory. The comments are divided into two categories, spam and ham (non-spam). The distribution of the values in both categories is roughly equal. First, the comments are converted into documents, whose category is the class spam or ham. The documents are then preprocessed by filtering and stemming. After that, the documents are transformed into a bag of words, which is filtered again. Only terms that occur at least in 1% of the documents (at least in 3 documents) will be used as features and not be filtered out. Then the documents are transformed into document vectors. The document vectors are a numerical representation of documents and are in the following used for classification via a support vector machine. The lower part contains the deployment workflow.

External resources

  • YouTube Spam Collection Dataset
  • Sentiment Classification of Documents
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.5.0
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    knime profile image
    knime
  • Go to item
    KNIME JavasnippetTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    knime profile image
    knime
  • Go to item
    KNIME TextprocessingTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits