Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • knime
  • Spaces
  • Examples
  • 08_Other_Analytics_Types
  • 01_Text_Processing
  • 17_TopicExtraction_with_the_ElbowMethod
WorkflowWorkflow

Topic Extraction

Topic Extraction Clustering K-Means Machine learning
+3
KNIME profile image

Last edited: 

Drag & drop
Like
Download workflow
Copy short link
Workflow preview
This workflow shows how to extract topics from text documents using the Topic Extractor node. It reads textual data from a table (or, alternatively, the data can be fetched directly from news websites using the RSS Feed Reader node) and converts them into documents. The documents are then preprocessed, i.e. tagged, filtered, lemmatized, etc. After that, the Topic Extractor node can be applied to the preprocessed documents. However, the node requires users to input the number of topics that should be extracted beforehand. There exist already a couple of methods to determine the best number of topics, we would use the "Elbow Method" in this workflow. The method basically runs k-means clustering on the input documents for a range of values of the number of clusters (e.g, from 1 to 20), and for each value calculates the within-cluster sum of squared errors (SSE), which is the sum of the distances of each data point in a cluster to its cluster center. Then, the SSE value for each number of clusters is plotted in a Scatter Plot. The best number of clusters should be the one where there is a drop of the SSE value, giving an angle in the plot. Note that the Elbow method doesn't always work for all data sets. If there is not a clear elbow to be found in the plot, try using a different approach, like the Silhouette Coefficient. After finding out the optimal number of clusters/topics for the documents, the Topic Extractor node can be executed and a tag cloud is created to visualize the topics' terms.

External resources

  • Topic Extraction: Optimizing the Number of Topics with the Elbow Method

Used extensions & nodes

Created with KNIME Analytics Platform version 4.1.0
  • Go to item
    KNIME Core Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    KNIME profile image
    knime
  • Go to item
    KNIME JavaScript Views Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    KNIME profile image
    knime
  • Go to item
    KNIME Math Expression (JEP) Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    KNIME profile image
    knime
  • Go to item
    KNIME Textprocessing Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    KNIME profile image
    knime
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item
Loading deployments
Loading ad hoc executions

Legal

By using or downloading the workflow, you agree to our terms and conditions.

Discussion
Discussions are currently not available, please try again later.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits