Hub
Pricing About
WorkflowWorkflow

Links From Text

DSpaceWeb archivingLink extraction
dagrenzer profile image
Draft Latest edits on 
Aug 4, 2021 6:34 AM
Drag & drop
Like
Download workflow
Workflow preview
This workflow can be used to make an initial assessment of the amounts and types of linking from a set of documents. First, it takes a list of URLs for PDF documents in a DSpace repository and retrieves the corresponding extracted text file. The text files are then checked for HTTP links and the domains of the links are used to identify which links are to "web at large" resources, as opposed to scholarly publications. A final check is also made to identify whether any links are to materials in web archives. The outputs are saved as separate sheets in a single Excel file.

External resources

  • Copy of Workflow in KAUST Research Repository
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.4.0
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime
  • Go to item
    KNIME JavasnippetTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime
  • Go to item
    KNIME REST Client ExtensionTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits