XML Easy Reader 1.10

v1.10 - 10 January 2022 PLEASE NOTE THIS COMPONENT IS STILL PROTOTYPE AND SUBJECT TO SOME CHANGES - FEEDBACK WELCOME! Python Script updated with some improvements, and also recoded so that it can be tested (in part) outside of KNIME (e.g. using VSCode) for easier development. Reads the supplied XML file, using the specified path as a local file system file, but if that fails, attempts to read it as a URL. This component uses Python 3 so you must have Python 3 installed and available in your KNIME environment. It makes use of the following Python modules: cElementTree, pandas, urllib The XML data is output in grouped tabular format, which means that the rows should be ungrouped (use an upgroup node). Those data items that are expected to be repeated across all rows for a "group" should be excluded from the selection of columns to be ungrouped. In that way, repeated data is "copied down" where appropriate across row items. Outputs of the columns and their paths is generated on the "Column Paths" port and on the "Path to Column Mapping" port. The Column Paths port is "by column name" and so if there is column-name clash (which can occur if more than one element in the XML has the same element-name) the resulting rows on this port will be deficient, as will the resulting data output. The "Path to Column Mapping" port shows the same information, but is "path centric" and so will contain any columns for which "name clash" has occurred. The "Column Name Clash" port will identify clashing names. This port should return no data if no name clash occurred, but can be used to quickly verify that all expected columns have been handled correctly. The name of a csv "Column Name to Path" mapping file can be supplied, which allows you to specify which elements/columns to return, based on their paths. By specifying a different column name here, the column will be renamed on the output. Paths follow a basic "pseudo xpath" format. No additional xpath syntax should be used as it will not be recognised, and will result in data in the file being ignored. Element paths are defined by the format //element1/element2/element3 Attribute paths are defined by the format //element1/element2/element3/@attributename Rows in the Column Name - Path mapping table can be "commented out". To do this, all that is necessary is that the path be "invalidated", and this can easily be achieved by, for example, adding a '#' to the end of the line e.g. In the following example, the paths for the * and orderperson lines have been "invalidated" so are ignored Column Name,Path *,*# Order Id,//shiporder/@orderid orderperson,//shiporder/orderperson# The path will change if you specify a different collection subtree, and/or root. If you are having difficulty working out the correct path, execute the node and take a look in the Column Paths output port to see what the paths are with the current configuration. v 1.0 (Prototype) @takbb Brian Bates This is a fully functioning prototype, but may well be suitable for your needs. If you wish to use it, please test it with your data to see that it works well for you before relying on it! Please provide feedback on any issues found, or any suggestions for improvement, or usability.

Component details

Output ports

KNIME Base nodes

KNIME Javasnippet

KNIME Python Integration

KNIME Quick Forms

Vernalis KNIME Nodes

Legal