Hub
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Hub
  • Nodes
  • Tika Parser
NodeNode / Source

Tika Parser

Other Data Types Text Processing IO Streamable
Drag & drop
Like
Copy short link

Apache Tika is a library that is mainly used to detect document types and extract textual contents and metadata from various file formats. Internally, Tika delegates all the parsing and detecting works to various existing document parsers and document type detection libraries. Tika provides a single generic API as a universal type detector and content extractor for many file formats. For more information about Tika, please check the Tika website .

This node allows parsing of any kind of documents that are supported by Tika. The type of the files can be selected in the configuration dialog. Users have the choice between selecting the file extensions, or the MIME-types. What kind of information that are to be extracted from the file (metadata and content) can also be selected in the dialog. If possible, user can also extract files that are embedded in the input files, such as attachments in E-mails, etc, and store them in a specified directory. Authentication setting is also provided to parse any encrypted files.

Node details

Output ports
  1. Type: Table
    Metadata output table
    An output table containing the parsed document data. The columns are the same as what was selected in the Metadata list in the configure dialog.
  2. Type: Table
    Attachment output table
    An output table containing the names of input files that contain any embedded files and also the paths to the extracted files in the output directory.

Extension

The Tika Parser node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
    03 Tagging
    Text mining Text processing POS
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    knime > Education > Self-Paced Courses > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  2. Go to item
    03 Tagging
    Text mining Text processing Tag
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    tqthanh168 > Public > Self-Paced Courses > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  3. Go to item
    03 Tagging
    Text mining Text processing Tag
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    oriensraymi > Public > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  4. Go to item
    03 Tagging
    Text mining Text processing Tag
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    oriensraymi > Public > L1-DS KNIME Analytics Platform for Data Scientists - Basics > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  5. Go to item
    03 Tagging
    Text mining Text processing POS
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    bashir82 > Public > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  6. Go to item
    03 Tagging
    Text mining Text processing POS
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    halverdog > Public > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  7. Go to item
    03 Tagging
    Text mining Text processing POS
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    manuel1972 > Public > Self-Paced Courses > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  8. Go to item
    03 Tagging
    Text mining Text processing POS
    +4
    L4-TP SELF-PACED COURSE exercise. Apply parts-of-speech, named entity, and wildcard taggi…
    enekogonzalezme > Public > L4-TP Introduction to Text Processing > Exercises > 03 Tagging
  9. Go to item
    Chapter 2/Exercise 2 - Read Romeo and Juliet
    Books From Words To Wisdom Exercise
    +1
    Read the content of all epub books (pg1513.epub) from the folder, Thedata. One of the boo…
    vincenzo > Public > From_Words_To_Wisdom_Book > Chapter2 > Exercises > Exercise 2. Romeo and Juliet
  10. Go to item
    Chapter 3/Exercise 1. Filtering - Punctuation Erasure and Stop Word Filter for "Romeo and Juliet"
    Books From Words To Wisdom Exercise
    +1
    Read the content of the epub book "Romeo and Juliet" from Thedata\pg1513.epub file and re…
    vincenzo > Public > From_Words_To_Wisdom_Book > Chapter3 > Exercises > Exercise 1. Punctuation_Erasure_and_Stop_Word_Filter
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item

KNIME
Open for Innovation

KNIME AG
Hardturmstrasse 66
8005 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Server
© 2022 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits