Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • knime
  • Spaces
  • Examples
  • 00_Components
  • Text Processing
  • Topic Extractor (STM)
ComponentComponent

Topic Extractor (STM)

KNIME profile image

Last edit:

Drag & drop
Like
Use or download
Copy short link
The component trains an STM topic model via unsupervised learning. It integrates with the R implementation of Structural Topic Models (STM), following Roberts, Stewart and Tingley, Journal of Statistical Software (2019) (cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf), via the R library 'stm' (cran.r-project.org/web/packages/stm). On its first execution the component is set up to automatically install R and all the required libraries. For this to work you need to install Conda (we recommend via "docs.conda.io/en/latest/miniconda.html"). KNIME Analytics Platform can automatically find the default path of where Conda is installed. You can make sure KNIME Analytics Platform is using the correct path via "File > Preferences > KNIME > Conda". DISCLAIMER: this component won't work on Apple M1 systems as the 'stm' package is not available for 'osx-arm64' via 'conda-forge' ("anaconda.org/conda-forge/r-stm"). For Apple Intel systems manual installation of additional software might be required after the Conda Environment Propagation node executes. For details visit: docs.knime.com/latest/r_installation_guide Use the component settings to select a document in the column type from the KNIME Textprocessing Extension. Simply apply the Strings to Document node and any other preprocessing required (stopwords removal, stemming, ...) upstream of this component. Given K, the number of topics to be created, it returns the predicted topic for each document as well as a set of terms representing each of the K topics. Optionally you can provide metadata columns and fields to the algorithm. Metadata fields are extracted from the document column type. Metadata columns are simply additional columns you provide at the input. Make sure to provide an operator (+. -, / ,*) for the automated 'Prevalence Formula' when you provide more than one metadata field/column.

Component details

Input ports
  1. Type: Table
    Document Table
    Data table with the document collection to analyze in the KNIME Textprocessing column type (use the 'Strings to Document' node first). Each row contains one document. Documents can be pre-processed (stopwords removal, stemming, ...).
Output ports
  1. Type: R Workspace
    R Model
    The R object with the trained model. Use the component "Topic Assigner (STM)" to apply this model to new documents.
  2. Type: Table
    Document with Topics Table
    The document collection with topic assignments and the probability for each document to belong to a certain topic. Such probabilities are taken from the gamma/theta matrix returned by the 'stm_tidiers' R function. Missing values are listed for rows with missing text or selected metadata fields/columns.
  3. Type: Table
    Terms of Topics
    The topic models with the terms and their weight per topic. The weight is taken from the beta matrix returned by the 'stm_tidiers' R function. The table lists a maximum number of terms per topic based on the component setting.
  4. Type: Table
    Scores Table
    A table listing metrics for the model on an automatically held-out partition of documents. One row for each K tested is provided if the "Optimal K Search'' is enabled. No precise method exists for selecting the best K automatically. Despite this four metrics can help in making this decision: exclusivity, coherence, residual variance, and held-out likelihood. The higher the exclusivity the more each topic is composed of terms unique between topics. The higher the semantic coherence the more similar words are included in the individual topics. The lower the residual variance the better the model fits. The higher the held-out likelihood the better model predicts new documents. Increasing K should decrease coherence, increase exclusivity, decrease residual variance, but can lead to overfitting, reducing the held-out likelihood.

Used extensions & nodes

Created with KNIME Analytics Platform version 4.7.4
  • Go to item
    KNIME Base nodes Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.2

    KNIME profile image
    knime
  • Go to item
    KNIME Conda Integration Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME Data Generation Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME Interactive R Statistics Integration Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME JavaScript Views Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME Javasnippet Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME Math Expression (JEP) Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME Plotly Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  • Go to item
    KNIME Quick Forms Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.4

    KNIME profile image
    knime
  • Go to item
    KNIME Textprocessing Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.7.0

    KNIME profile image
    knime
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item

Legal

By using or downloading the component, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits