Hub
Pricing About
WorkflowWorkflow

Challenge 14 - News Article Scraper & Reader

JustknimeitWeb scrapingError handlingError logsWorkflow control
+1
Just KNIME It profile image
Draft Latest edits on 
Aug 19, 2025 2:05 PM
Drag & drop
Like
Download workflow
Workflow preview

Challenge 14 - News Article Scraper & Reader

Level: Medium

Description: Julian works as a researcher at a media and journalism research institute in Dublin. Every morning, he scans RSS news feeds from authoritative outlets, e.g. BBC, to stay informed and gather content for his research. However, doing this manually is tedious and time-consuming. RSS feeds only provide limited summaries, and Julian often has to click through each article to extract the full text and fetch associated images. He needs a way to automate content and image retrieval and organize it into an interactive news reader.

To help Julian, you decide to build a workflow that reads the BBC World RSS news feeds, scrapes full articles, and extracts the first image of each article along its caption. The workflow should also allow Julian to view the full text of scraped news articles and the associated image interactively. Can you help Julian automate the process?

Beginner-friendly objectives: 1. Read the BBC World RSS news feeds, filter out news that contain "videos" in the URL, and format the date & time info to your liking. 2. Scrape and extract the full text of each news article, the first image and its caption (if available). 3. Visualize news article details (e.g., titles, publication date, etc.), as well as the full scraped text, the associated first image and caption (remove ".webp" from image URLs and retain only .jpg files).

Intermediate-friendly objectives: 1. Make the selection of each news articles and its corresponding image more flexible with widgets, creating an interactive data app that makes reading news more engaging. 2. Add beautification elements to your data app (e.g., a title, a subtitle, emoji, instructions on the intended use, etc.). 3. Web scraping can be prone to issues because website structures may change, sites may rate-limit traffic, or the Internet connection may temporarily drop or become unstable. All of that breaks the scraping logic and leads to errors or missing data, if not handled carefully. Add error handling techniques to make sure that for each news article the scraper deals with errors or missing data gracefully (if the news scraper runs without errors, you can simulate them by temporarily disabling your Internet connection). 4. Log detailed errors for each failed article scarping attempt (e.g., reason of failure, the failing node, etc.). Make sure to also add the date and timestamp of when the error occurred.

Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 5.5.1
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime
  • Go to item
    KNIME Column Expressions (legacy)Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME Excel SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME ExpressionsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime
  • Go to item
    KNIME Image ProcessingTrusted extension

    University of Konstanz / KNIME

    Version 1.8.3

    bioml-konstanz
  • Go to item
    KNIME Quick FormsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime
  • Go to item
    KNIME REST Client ExtensionTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME SVG SupportTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME TextprocessingTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime
  • Go to item
    KNIME ViewsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.1

    knime
  • Go to item
    KNIME XML-ProcessingTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.5.0

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits