Dictionary Tagger (Multi Column)

Manipulator

This node recognizes named entities specified in one or more dictionary columns and assigns a specified tag value and type. Optionally, the recognized named entity terms can be set unmodifiable, meaning that the terms are not modified or filtered afterwards by any following preprocessing node. However, succeeding tagging nodes can overwrite tags of an unmodifiable term.

If the same entity is contained in different dictionaries, it will be tagged for every fitting configuration. For example, the document contains the term "London" and "London" is also contained in three different dictionaries, it will be tagged with all three tags that have been set for the specific dictionaries.

The sequence of the tags depends on the order of the dictionaries within the node dialog. The order can be changed by using the up/down arrow buttons.

Note, if there are any multi word entities in your dictionary and there is a succeeding dictionary containing one word of the multi word entity, the single word will be tagged only.

Example:

  • Document: "New York is beautiful."
  • Dictionary 1: "New York"
  • Dictionary 2: "York"

In this case only "York" will be tagged. If there is a third dictionary containing "New York" as well, "New York" will be tagged with the tags set for the first and the third dictionary.
The order of the entities within a dictionary is also important. Equally as the order of the dictionaries, the first entity in the dictionary will be tagged first.

Input Ports

  1. Type: Data
    The input table containing the documents to tag.
  2. Type: Data
    The input table containing one or multiple dictionary columns.

Output Ports

  1. Type: Data
    An output table containing the tagged documents.

Extension

This node is part of the extension

KNIME Textprocessing

v4.0.0

Short Link

Drag node into KNIME Analytics Platform