Hub
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
Sign in
  • KNIME Hub
  • Nodes
  • Webpage Retriever
NodeNode / Manipulator

Webpage Retriever

streamable

This node can be used to retrieve webpages by issuing HTTP GET requests and parsing the requested HTML webpages. For parsing, jsoup is used as library which implements the WHATWG HTML5 specification. The parsed HTML will be cleaned by removing comments and, optionally, replacing relative URLs by absolute ones.

By default, the output table will contain a column with the parsed HTML converted into XHTML. However, you can specify to get the parsed HTML as string output instead.

The node allows you to either send a request to a fixed URL (which is specified in the dialog) or to a list of URLs provided by an optional input table. Every URL will result in one request which in turn will result in one row in the output table. You can define custom request headers in the dialog.

The node supports several authentication methods, e.g. BASIC and DIGEST. Other authentication methods may be provided by additional extensions.

Cookies can be send to the server via the Request Header tab by setting the "Cookie" header. In order to receive cookies, set the "Extract cookies" option. Any cookies sent by the server are then extracted and appended as a List Cell in the output.

External resources

  • KNIME E-Learning Course: Sending a GET Request to a REST service

Node details

Input ports
  1. Table Type: Data
    Optional data table containing the variable parameters of the requests.
Output ports
  1. Webpage Retriever results Type: Data
    Data table containing the parsed HTML either as string or as XHTML and optionally the cookies as list of strings.

Related workflows & nodes

  1. Query Google For Address
    This workflow takes a single input about a location from a user, converts it to a Google query URL, sends that URL to G…
    scottf > Public > ForumWorkflows > 2019 > 12 > Query_Google_for_address
  2. Query Google For Address
    This workflow takes a single input about a location from a user, converts it to a Google query URL, sends that URL to G…
    mmarag > Public > Query_Google_for_address
  3. Web Scrape Example COVID-19 Outbreak Victoria
    COVID19
    I was interested in learning about how do a basic webscrape for data during lockdown using Knime. I'm scraping this dat…
    damo_f > Public > KNIME_Webscrape_COVID_Victoria_Numbers
  4. dcif_homepage_scraping_Lösung
    kathrin > dcif_webinar > dcif_homepage_scraping_Lösung
  5. List Images from a Wikipedia Article using Webpage Retriever
    Webpage Retriever wikipedia image +3
    This example workflow shows how the Webpage Retriever node can be used to create a list of images retrieved from the Wi…
    knime > Examples > 01_Data_Access > 05_REST_Web_Services > 08_List_Images_From_Wikipedia
  6. List Images from a Wikipedia Article using Webpage Retriever
    Webpage Retriever wikipedia image +3
    This example workflow shows how the Webpage Retriever node can be used to create a list of images retrieved from the Wi…
    lcs > Public > 01_Data_Access > 05_REST_Web_Services > 08_List_Images_From_Wikipedia
  7. List Images from a Wikipedia Article using Webpage Retriever
    Webpage Retriever
    This example workflow shows how the Webpage Retriever node can be used to create a list of images retrieved from the Wi…
    marcelw > Public > Example_workflows_cleaned_up > 01_Data_Access > 05_REST_Web_Services > 08_List_Images_From_Wikipedia
  8. Extraction and Tag Cloud Visualization of Named Entities from New York Times News Feeds
    NLP Natural Language Processing tagging +1
    The workflow starts with a URL to a NY Times rss news feed. The news feed is downloaded and parsed and transformed in D…
    knime > Examples > 08_Other_Analytics_Types > 01_Text_Processing > 06_NY_Times_RSS_Feed_Tag_Cloud
  9. Will they blend? MS Word meets Web Crawling.
    NLP Natural Language Processing
    On one side we have a list of cookie recipes saved in a Word Document on the local machine. On the other side we have a…
    knime > Examples > 08_Other_Analytics_Types > 01_Text_Processing > 10_Discover_Secret_Ingredient
  10. Will they blend? MS Word meets Web Crawling.
    NLP Natural Language Processing
    On one side we have a list of cookie recipes saved in a Word Document on the local machine. On the other side we have a…
    sameerb > Public > 10_Discover_Secret_IngredientSB

No known nodes available

Extension

This node is part of the extension

KNIME REST Client ExtensionTrusted extension
Version 4.3.0
Short link

KNIME
Open for Innovation

KNIME AG
Hardturmstrasse 66
8005 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Server
© 2021 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits