Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • knime
  • Spaces
  • Just KNIME It!
  • Season 1
  • Challenge 37 - Deduplicate Text - Solution
WorkflowWorkflow

Challenge 37 - Text Deduplication - Solution

Text processing Tika parser Ocr Justknimeit Justknimeit-37
KNIME profile image

Last edit:

Drag & drop
Like
Download workflow
Copy short link
Workflow preview
You are asked to read Swedish textual data from a PDF using the Tika Parser. You then notice that much of the text is duplicated, which could be an encoding issue with the PDF itself. Consequently, you decide to to deduplicate the text. In this challenge, do your best to remove excessive amounts of duplicated text using as few nodes as possible. In most cases like this, you are not aiming for perfect removal of text, but instead are aiming for a cost effective approach which eliminates a large chunk of the duplication. Hint: Our solution consists of 5 nodes, but the 5th node may be unnecessary depending on your workflow.

Used extensions & nodes

Created with KNIME Analytics Platform version 4.6.3
  • Go to item
    KNIME Base nodes Trusted extension

    KNIME AG, Zurich, Switzerland

    Versions 4.5.2, 4.6.2

    KNIME profile image
    knime
  • Go to item
    KNIME Data Generation Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Deep Learning - Keras Integration Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Ensemble Learning Wrappers Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME H2O Machine Learning Integration Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Integrated Deployment Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME JavaScript Views Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.2

    KNIME profile image
    knime
  • Go to item
    KNIME Javasnippet Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Machine Learning Interpretability Extension Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Math Expression (JEP) Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Optimization extension Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME PMML Preprocessing Applier Nodes Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Quick Forms Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.0

    KNIME profile image
    knime
  • Go to item
    KNIME Textprocessing Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.0

    KNIME profile image
    knime
  • Go to item
    KNIME XGBoost Integration Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.6.2

    KNIME profile image
    knime
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item
Loading deployments
Loading ad hoc executions

Legal

By using or downloading the workflow, you agree to our terms and conditions.

Discussion
Discussions are currently not available, please try again later.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits