This workflow can be used to make an initial assessment of the amounts and types of linking from a set of documents. First, it takes a list of URLs for PDF documents in a DSpace repository and retrieves the corresponding extracted text file. The text files are then checked for HTTP links and the domains of the links are used to identify which links are to "web at large" resources, as opposed to scholarly publications. A final check is also made to identify whether any links are to materials in web archives. The outputs are saved as separate sheets in a single Excel file.
Workflow
Links From Text
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.