Hub
Pricing About
WorkflowWorkflow

Reading Image Based PDFs with Tika Parser

Text processingTika parserOcrTess4J
abembenek profile image
VersionFinal VersionLatest, created on 
Sep 26, 2024 8:52 PM
Drag & drop
Like
Download workflow
Workflow preview

This workflow uses the Tika Parser node to read the characters from a PDF. The PDF appears to be in an image format versus a text format. The PDF has an image at the top of the page with some text in that image. The PDF has additional text below the image. The workflow uses an Image Reader (Table) node and then the Tess4J node for the OCR processing of any of the characters in the PDF.

Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.3.1
  • Go to item
    KNIME Image ProcessingTrusted extension

    University of Konstanz / KNIME

    Version 1.8.3

    bioml-konstanz
  • Go to item
    KNIME Image Processing - Tess4J Integration

    University of Konstanz - Jonathan Hale, Christian Dietz

    Version 1.3.3

    bioml-konstanz
  • Go to item
    KNIME TextprocessingTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.3.1

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits