This workflow uses the Tika Parser node to read the characters from a PDF. The PDF appears to be in an image format versus a text format. The PDF has an image at the top of the page with some text in that image. The PDF has additional text below the image. The workflow uses an Image Reader (Table) node and then the Tess4J node for the OCR processing of any of the characters in the PDF.
Workflow
Reading Image Based PDFs with Tika Parser
Used extensions & nodes
Created with KNIME Analytics Platform version 5.3.1
- Go to item
KNIME Image Processing - Tess4J Integration
University of Konstanz - Jonathan Hale, Christian Dietz
Version 1.3.3
Legal
By using or downloading the workflow, you agree to our terms and conditions.