This node implements the Hussain and Rea algorithm for finding Matched Molecular Pairs in a dataset. The node takes two input tables of fragments generated MMP Molecule Fragment nodes and generates an output table of matched molecular pairs (MMPs)
In this implementation pairs are only created between rows of the query and reference tables (the 'forwards' direction is from the 'Left' query row to the 'Right' reference row). Both tables must have the same structure
The node requires two SMILES input columns, representing the 'key' (unchanging atoms) and 'value', and a string column containing the ID. The node will attempt to auto-guess these column selections based on the default names for the columns output by the fragment node.
The input table can contain fragmentations from differing numbers of cuts, in which case this will be reflected in the output table.
The table will be pre-sorted by key followed by value during execution, unless the 'Incoming table is sorted by Keys and Values?' option is selected. If this option is selected and correct sorting is not applied, then pairs may be missed (incorrect keys sorting) or non-canonical in their direction (incorrect values sorting)
Incoming columns can be passed through unchanged (Left, Right or both), numeric columns (Integer, Long, Double and Complex Number) can have differences (L - R or R - L) and ratios (Double only) calculated (L / R or R / L)
Transforms can be filtered based on the Value Attachment point graph distance calculated during fragmentation using a number of options
- None - No filtering
- Max total graph distance change - the sum of all graph distance changes
- Max single graph distance change - the maximum tolerated change in any single distance
- Tanimoto - the vector Tanimoto similarity
- Dice - the vector Dice similarity
- Cosine - the vector Cosine similarity
- Euclidean - the vector Euclidean distance
- Hamming - the vector Hamming (Manhattan or City-block) distance
- Soergel - the vector Soergel distance
1.J. Hussain and C Rea, " Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets ", J. Chem. Inf. Model. , 2010, 50 , 339-348 (DOI: 10.1021/ci900450m ).