Hub
Pricing About
WorkflowWorkflow

Apache Tika integration

NLPNatural Language ProcessingTika
knime profile image
Draft Latest edits on 
Aug 30, 2016 2:58 PM
Drag & drop
Like
Download workflow
Workflow preview
The goal of the workflow is to show how to parse content of files using Tika nodes, detect the languages of the content using Tika language detector and finally assign a POS tag for each english word found in the document files. First, the Tika parser reads files from a specified directory and parses their content (any detected attachments/embedded files will be extracted as well). A language detector node is then used to detect languages used in the contents. Any file not written in english is filtered out. The remaining files are converted into documents, where a Stanford tagger is then applied to assign a POS tag for each term.
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.1.0
  • Go to item
    KNIME CoreTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    knime profile image
    knime
  • Go to item
    KNIME TextprocessingTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.1.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits