Reading Image Based PDFs with Tika Parser

Workflow

Reading Image Based PDFs with Tika Parser

VersionFinal VersionLatest, created on

This workflow uses the Tika Parser node to read the characters from a PDF. The PDF appears to be in an image format versus a text format. The PDF has an image at the top of the page with some text in that image. The PDF has additional text below the image. The workflow uses an Image Reader (Table) node and then the Tess4J node for the OCR processing of any of the characters in the PDF.

Loading deploymentsLoading manual runs

Legal

By using or downloading the workflow, you agree to our terms and conditions.