The goal of the workflow is to show how to parse content of files using Tika nodes, detect the languages of the content using Tika language detector and finally assign a POS tag for each english word found in the document files. First, the Tika parser reads files from a specified directory and parses their content (any detected attachments/embedded files will be extracted as well). A language detector node is then used to detect languages used in the contents. Any file not written in english is filtered out. The remaining files are converted into documents, where a Stanford tagger is then applied to assign a POS tag for each term.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.1.0
By using or downloading the workflow, you agree to our terms and conditions.
Discussions are currently not available, please try again later.