Reference Fragments to MMPs

This node implements the Hussain and Rea algorithm for finding Matched Molecular Pairs in a dataset. The node takes two input tables of fragments generated MMP Molecule Fragment nodes and generates an output table of matched molecular pairs (MMPs)

In this implementation pairs are only created between rows of the query and reference tables (the 'forwards' direction is from the 'Left' query row to the 'Right' reference row). Both tables must have the same structure

The node requires two SMILES input columns, representing the 'key' (unchanging atoms) and 'value', and a string column containing the ID. The node will attempt to auto-guess these column selections based on the default names for the columns output by the fragment node.

The input table can contain fragmentations from differing numbers of cuts, in which case this will be reflected in the output table.

The table will be pre-sorted by key followed by value during execution, unless the 'Incoming table is sorted by Keys and Values?' option is selected. If this option is selected and correct sorting is not applied, then pairs may be missed (incorrect keys sorting) or non-canonical in their direction (incorrect values sorting)

Incoming columns can be passed through unchanged (Left, Right or both), numeric columns (Integer, Long, Double and Complex Number) can have differences (L - R or R - L) and ratios (Double only) calculated (L / R or R / L)

Transforms can be filtered based on the Value Attachment point graph distance calculated during fragmentation using a number of options

None - No filtering
Max total graph distance change - the sum of all graph distance changes
Max single graph distance change - the maximum tolerated change in any single distance
Tanimoto - the vector Tanimoto similarity
Dice - the vector Dice similarity
Cosine - the vector Cosine similarity
Euclidean - the vector Euclidean distance
Hamming - the vector Hamming (Manhattan or City-block) distance
Soergel - the vector Soergel distance

Filtering can also be performed based on the change in heavy atom count during the transformation

This node was developed by Vernalis Research . For feedback and more information, please contact knime@vernalis.com

1.J. Hussain and C Rea, " Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets ", J. Chem. Inf. Model. , 2010, 50 , 339-348 (DOI: 10.1021/ci900450m ).

Node details

Ports Options Views

Input ports

Type: Table
Reference Key-Value pairs
Fragmented molecule key-value pairs (The 'Right' part of pair in forwards direction)
Type: Table
Query Key-Value pairs
Fragmented molecule key-value pairs (The 'Left' part of pair in forwards direction)

Output ports

Type: Table
MMP transforms
Matched pair transformations

Extension

The Reference Fragments to MMPs node is part of this extension:

Go to item