HtmlParser

Manipulator

This HTML parser is based on Validator.nu.

Quotation from the web page: The Validator.nu HTML Parser is an implementation of the HTML5 parsing algorithm in Java. The parser is designed to work as a drop-in replacement for the XML parser in applications that already support XHTML 1.x content with an XML parser and use SAX, DOM or XOM to interface with the parser. Low-level functionality is provided for applications that wish to perform their own IO and support document.write() with scripting. The parser core compiles on Google Web Toolkit and can be automatically translated into C++. (The C++ translation capability is currently used for porting the parser for use in Gecko.)

Input Ports

  1. Type: Data Input table containing HttpResults, binary object data, or file paths with (X)HTML data to be parsed. <b>Note:</b> Although technically possible, it is not recommended to input <tt>http</tt> links directly into the parser. Use the HttpRetriever for downloading instead and input the HttpResults into this node.

Output Ports

  1. Type: Data Output table with parsed (X)HTML documents appended. In case, a document could not be parsed, a 'missing value' is appended.

Find here

Community Nodes > Palladian

Make sure to have this extension installed:

Palladian for KNIME

Update site for KNIME Analytics Platform 3.7:
KNIME Community Contributions (3.7)

How to install extensions