This node allows to transform URLs to a canonicalized representation, e. g. for matching them in data mining scenarios. It is not intended, to produce equally working URLs in all cases. The following steps are performed by this node:

  • Lower case URL
  • Add http:// protocol, if not present
  • Transform https:// to http:// protocol
  • Remove session IDs from URLs
  • Normalize relative path components (e. g. "..")
  • Remove trailing slashes from URLs
  • Remove "index.htm*" part from URLs
  • Sort query parameters alphabetically

Input Ports

  1. Type: Data Table which contains a column with URLs to process.

Output Ports

  1. Type: Data Input table with appended column for canonicalized URLs.

Find here

Community Nodes > Palladian

Make sure to have this extension installed:

Palladian for KNIME

Update site for KNIME Analytics Platform 3.7:
KNIME Community Contributions (3.7)

How to install extensions