Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • mlauber71
  • Spaces
  • Public
  • forum
  • kn_forum_26384_pdf_table_extract_r
WorkflowWorkflow

Extract Table from PDF with the help of R "tabulizer" and KNIME

Pdf Table Extract Tabulizer R
mlauber71 profile image

Last edited: 

Drag & drop
Like
Download workflow
Copy short link
Workflow preview
I had problems with the mentioned KNIME only approaches so I tried something with KNIME and R. It has these steps: * run and configure R's "tabulizer" * it seems the settings 'stream' and GUESS are working best in your case * it would extract one table from each page and try to find headers and bring them to a table * not all information would be in the same columns (we come to that later) * the tables are saved as single CSVs (with their varying structure) * then they would be imported into KNIME forcing the columns to be all strings and be brought into a single table * the text fields which contain information in three columns would be integrated * the summary lines with the Credit balance would be separated * a single ID for each transaction block is created and distributed * the "our reference" field is extracted separately and be stored in a separate column (you might do that to other information as well) * the remaining "communication" is brought into one cell * all the information is being put together and could be stored Of course, you might do further manipulations like converting the sums into numbers. Introducing checks with the separate balances and so on. If you have columns that would change very much you might have to alter the workflows and change the definitions in R.

External resources

  • Need to extract tables from a pdf using R
  • How to extract tabular data from PDFs with R
  • KNIME forum - Extract Tables from PDF
  • Introduction to tabulizer
  • forum entry

Used extensions & nodes

Created with KNIME Analytics Platform version 4.2.1
  • Go to item
    KNIME Base nodes Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.2.2

    knime
  • Go to item
    KNIME Excel Support Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.2.1

    knime
  • Go to item
    KNIME Interactive R Statistics Integration Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.2.0

    knime
  • Go to item
    KNIME Javasnippet Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.2.0

    knime
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item
Loading deployments
Loading ad hoc executions

Legal

By using or downloading the workflow, you agree to our terms and conditions.

Discussion
Discussions are currently not available, please try again later.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits