Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • knime
  • Spaces
  • Examples
  • 08_Other_Analytics_Types
  • 01_Text_Processing
  • 02_Document_Classification
WorkflowWorkflow

Document Classification: Model Training and Deployment

NLP Natural Language Processing Text Classification Practicing Data Science
KNIME profile image

Last edit:

Drag & drop
Like
Download workflow
Copy short link
Workflow preview
The goal of this workflow is to do spam classification using YouTube comments as the dataset. The workflow starts with a data table containing some YouTube comments taken from the YouTube Spam Collection Data Set at the UCI ML Repository[1] . The data is available in the workflow directory. The comments are divided into two categories, spam and ham (non-spam). The distribution of the values in both categories is roughly equal. First, the comments are converted into documents, whose category is the class spam or ham. The documents are then preprocessed by filtering and stemming. After that, the documents are transformed into a bag of words, which is filtered again. Only terms that occur at least in 1% of the documents (at least in 3 documents) will be used as features and not be filtered out. Then the documents are transformed into document vectors. The document vectors are a numerical representation of documents and are in the following used for classification via a support vector machine. The lower part contains the deployment workflow.

External resources

  • Sentiment Classification of Documents
  • YouTube Spam Collection Dataset

Used extensions & nodes

Created with KNIME Analytics Platform version 4.5.0
  • Go to item
    KNIME Base nodes Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    KNIME profile image
    knime
  • Go to item
    KNIME Javasnippet Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    KNIME profile image
    knime
  • Go to item
    KNIME Textprocessing Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    KNIME profile image
    knime
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item
Loading deployments
Loading ad hoc executions

Legal

By using or downloading the workflow, you agree to our terms and conditions.

Discussion
Discussions are currently not available, please try again later.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits