NodeClean HTML Retriever

Manipulator

This node takes URL from a column, retrieves its content (assuming to be in HTML form) for parsing. If HTML content is available in another column, it can take HTML content directly instead of pulling from URL. HTML content is then parsed and cleaned up using HtmlCleaner to output in XHTML form. The result can be configured to output in either String for XML type.

Input Ports

  1. Port Type: Data
    An input table that contains URL / content columns

Output Ports

  1. Port Type: Data
    An output table URL and XHTML results