The goal of the workflow is to show how to parse content of files using Tika nodes, detect the languages of the content using Tika language detector and finally assign a POS tag for each english word found in the document files. First, the Tika parser reads files from a specified directory and parses their content (any detected attachments/embedded files will be extracted as well). A language detector node is then used to detect languages used in the contents. Any file not written in english is filtered out. The remaining files are converted into documents, where a Stanford tagger is then applied to assign a POS tag for each term.
Workflow
Apache Tika integration
Used extensions & nodes
Created with KNIME Analytics Platform version 4.1.0
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Loading deployments
Loading ad hoc executions
Legal
By using or downloading the workflow, you agree to our terms and conditions.
Discussion
Discussions are currently not available, please try again later.