Challenge 37: Text Deduplication You are asked to read Swedish textual data from a PDF using the Tika Parser. You then notice that much of the text is duplicated, which could be an encoding issue with the PDF itself. Consequently, you decide to to deduplicate the text. In this challenge, do your best to remove excessive amounts of duplicated text using as few nodes as possible. In most cases like this, you are not aiming for perfect removal of text, but instead are aiming for a cost effective approach which eliminates a large chunk of the duplication.
Just KNIME It _ Challenge 037
Used extensions & nodes
Created with KNIME Analytics Platform version 4.5.2
Loading ad hoc jobs
By using or downloading the workflow, you agree to our terms and conditions.