Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • Nodes
  • Tika Parser URL Input
NodeNode / Manipulator

Tika Parser URL Input

Other Data Types Text Processing Misc Streamable
Drag & drop
Like
Copy short link

This node has the same function as the Tika Parser node, which is to parse any documents that are supported by Tika. The difference is that this node takes file paths from a string column as input. The type of the files can be selected in the configuration dialog. Users have the choice between selecting the file extensions, or the MIME-types. What kind of information that are to be extracted from the file (metadata and content) can also be selected in the dialog. If possible, user can also extract files that are embedded in the input files, such as attachments in E-mails, etc, and store them in a specified directory. Authentication setting is also provided to parse any encrypted files.

Node details

Input ports
  1. Type: Table
    Table containing the filepaths
    The input table containing the URLs or paths to files that are to be parsed. The input table has to contain at least one String column.
Output ports
  1. Type: Table
    Metadata output table
    An output table containing the parsed document data. The columns are the same as what was selected in the Metadata list in the configure dialog.
  2. Type: Table
    Attachment output table
    An output table containing the names of input files that contain any embedded files and also the paths to the extracted files in the output directory.

Extension

The Tika Parser URL Input node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
    Chapter 2/Exercise 2 - Read Romeo and Juliet
    Books From Words To Wisdom Exercise
    +1
    Read the content of all epub books (pg1513.epub) from the folder, Thedata. One of the boo…
    vincenzo > Public > From_Words_To_Wisdom_Book > Chapter2 > Exercises > Exercise 2. Romeo and Juliet
    vincenzo
  2. Go to item
    Apache Tika integration
    Books Text Mining From Words To Wisdom
    The goal of this workflow is to show how to use KNIME Tika Integration nodes to parse doc…
    vincenzo > Public > From_Words_To_Wisdom_Book > Chapter2 > 02_Tika_Integration
    vincenzo
  3. Go to item
    Extension_Requiring_Nodes
    hayasaka > KNIME Fall Summit Training 2022 > L4-DV Low Code Data Extraction and Visualization > Extension_Requiring_Nodes
    hayasaka
  4. Go to item
    Extension_Requiring_Nodes
    chemgirl36 > Public Space > L4-DV Low Code Data Extraction and Visualization > Extension_Requiring_Nodes
    chemgirl36
  5. Go to item
    Extension_Requiring_Nodes
    knime > Education > Courses > L4-DV Low Code Data Extraction and Visualization > Extension_Requiring_Nodes
    knime
  6. Go to item
    02_Regex_with_PDFs_Solution
    Education
    This workflow shows how to use Tika Parser and PDF parser to read and parse the PDF files…
    hayasaka > KNIME Fall Summit Training 2022 > L4-DV Low Code Data Extraction and Visualization > Session_3 > 02_Solutions > 03.2_Regex_with_PDFs_Solution
    hayasaka
  7. Go to item
    02_Regex_with_PDFs_Solution
    Education
    This workflow shows how to use Tika Parser and PDF parser to read and parse the PDF files…
    chemgirl36 > Public Space > L4-DV Low Code Data Extraction and Visualization > Session_3 > 02_Solutions > 03.2_Regex_with_PDFs_Solution
    chemgirl36
  8. Go to item
    02_Regex_with_PDFs_Solution
    Education
    This workflow shows how to use Tika Parser and PDF parser to read and parse the PDF files…
    knime > Education > Courses > L4-DV Low Code Data Extraction and Visualization > Session_3 > 02_Solutions > 03.2_Regex_with_PDFs_Solution
    knime
  9. Go to item
    Apache Tika integration
    NLP Natural Language Processing Tika
    The goal of the workflow is to show how to parse content of files using Tika nodes, detec…
    knime > Examples > 08_Other_Analytics_Types > 01_Text_Processing > 16_Tika_Parsing
    knime
  10. Go to item
    OCR Foreign Language PDFs with Python and KNIME
    PDF OCR Foreign Language
    +2
    This workflow shows you how to OCR a Foreign Language (Japanese, but this can be changed …
    victor_palacios > Public > OCR_Python
    victor_palacios
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits