Hub
Pricing About
NodeNode / Manipulator

Webpage Retriever

Tools & ServicesREST Web ServicesStreamable
Drag & drop
Like

This node can be used to retrieve webpages by issuing HTTP GET requests and parsing the requested HTML webpages. For parsing, jsoup is used as library which implements the WHATWG HTML5 specification. The parsed HTML will be cleaned by removing comments and, optionally, replacing relative URLs by absolute ones.

By default, the output table will contain a column with the parsed HTML converted into XHTML. However, you can specify to get the parsed HTML as string output instead.

The node allows you to either send a request to a fixed URL (which is specified in the dialog) or to a list of URLs provided by an optional input table. Every URL will result in one request which in turn will result in one row in the output table. You can define custom request headers in the dialog.

The node supports several authentication methods, e.g. BASIC and DIGEST. Other authentication methods may be provided by additional extensions.

Cookies can be send to the server via the Request Header tab by setting the "Cookie" header. In order to receive cookies, set the "Extract cookies" option. Any cookies sent by the server are then extracted and appended as a List Cell in the output.

The node supports the Credential port as input (see dynamic input ports). If the port is added, it must supply a Credential that can be embedded into the HTTP Authorization header, and all request done by the node will use the Credential from the port, regardless of other node settings. The OAuth2 Authenticator nodes provide such a Credential for example.

External resources

  • KNIME E-Learning Course: Sending a GET Request to a REST service

Node details

Input ports
  1. Type: Table
    Table
    Optional data table containing the variable parameters of the requests.
Output ports
  1. Type: Table
    Webpage Retriever results
    Data table containing the parsed HTML either as string or as XHTML and optionally the cookies as list of strings.
Credential (Dynamic Inport)
A Credential, that can be embedded into the HTTP Authorization header. If this port is added, then all request done by the node will always use the Credential from the port, regardless of other node settings. The OAuth2 Authenticator nodes provide such a Credential for example.
  1. Type: org.knime.credentials.base.CredentialPortObject

Extension

The Webpage Retriever node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits