PDF Parser

Node / Source

PDF Parser

This node allows you to read PDF documents and create a document for each file. The documents title and authors will be extracted form the PDFs meta data. The full text of the PDF is extracted, the structure of the PDF is not taken into account. For text extraction the PDFBox library is used. (see http://pdfbox.apache.org/ for details).

Node details

Ports Options Views

Output ports

Type: Table
Documents output table
An output table containing the parsed document data.

Extension

The PDF Parser node is part of this extension:

Go to item

PDF Parser

Node details

Output ports

Extension

Related workflows & nodes