Hub
Pricing About
NodeNode / Manipulator

Stanford tagger (deprecated)

Other Data TypesText ProcessingEnrichmentStreamable

This node has been deprecated and its use is not recommended. Please search for updated nodes instead.

Like

This node assigns to each term of a document a part of speech (POS) tag. It is applicable for French, English and German texts. The underlying tagger models are models of the Stanford NLP group:
http://nlp.stanford.edu/software/tagger.shtml

For English texts the Penn Treebank tag set is used:
http://www.cis.upenn.edu/~treebank ).
For German texts the STTS tag set is used:
http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html .
For French texts the French Treebank tag set is used: http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php .

Note: the provided tagger models vary in memory consumption and processing speed. Especially the models English bidirectional, German hgc, and Germany dewac require a lot of memory. For the usage of these models it is recommended to run KNIME with at least 2GB of heap space. To increase the head space, change the -Xmx setting in the knime.ini file. If KNIME is running with less than 1.5GB heap space it is recommended to use English left3words, English left3words caseless, or German fast models for tagging of english or german texts.

Descriptions of the models (taken from the website of the Stanford NLP group):

  • English bidirectional: Trained on WSJ sections 0-18 using a bidirectional architecture and including word shape and distributional similarity features.
  • English left3words: Trained on WSJ sections 0-18 and extra parser training data using the left3words architecture and includes word shape and distributional similarity features.
  • English left3words caseless: Trained on WSJ sections 0-18 and extra parser training data using the left3words architecture and includes word shape and distributional similarity features. Ignores case.
  • German hgc: Trained on the first 80% of the Negra corpus, which uses the STTS tagset.
  • German dewac: This model uses features from the distributional similarity clusters built from the deWac web corpus.
  • German Fast: Lacks distributional similarity features, but is several times faster than the other alternatives.
  • French: Trained on the French treebank.

Node details

Input ports
  1. Type: Table
    Documents input table
    The input table containing the documents to tag.
Output ports
  1. Type: Table
    Documents output table
    An output table containing the tagged documents.

Extension

The Stanford tagger (deprecated) node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits