Hub
Pricing About
NodeNode / Other

Keygraph Keyword Extractor

Other Data TypesText ProcessingMining
Drag & drop
Like

This node analyses documents and extracts relevant keywords using the graph-based approach described in "KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Connstruction Metaphor" by Yukio Ohsawa.
First, a predetermined amount of terms are selected based on their frequency (high frequency set, HF) and added as the initial nodes of the graph.
The association strength between each of these terms is then calculated using the following scoring method: assoc(term1, term2) = min(occurrence frequency of term1, occurrence frequency of term2) summed for every sentence in the document. The top |HF|-1 associations are inserted into the graph as edges.
If an edge between two terms is the only path that connects them, it is pruned.
The graph's connected subgraphs are then extracted and considered as "concept" clusters. A new batch of terms is added based on their key score, which is the conditional probability that a term will be used if the author has all the concepts (clusters) in mind (P(UNION(w|g)) where t is the term and the union is done over every cluster g of the set of clusters.
Each of these new terms is then linked to every cluster using the strongest scoring edge amongst the possible ones.
Finally, all the terms in the graph are rated based on this formula: score(t) = summation over every edge connecting t and other terms (w), summation over every sentences, min(freq(t), freq(w)).
Setting the console's output level to DEBUG will make this node display the contents of the clusters after the pruning phase. terms.

Node details

Input ports
  1. Type: Table
    Documents input table
    The input table which contains the documents to analyse.
Output ports
  1. Type: Table
    Keywords output table
    The output table which contains (keyword term, score, associated document) tuples.

Extension

The Keygraph Keyword Extractor node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits