This node implements the Hussain and Rea algorithm (Ref 1) for finding Matched Molecular Pairs in a dataset. The user can specify the number maximum number of cuts to be made (1 - 10), and whether Hydrogens should be added (1 cut only). All cuts from 1 to the specified number are made.
The node implements the molecule fragmentation part of the process, enabling the fragmented molecule key-value pairs to be stored in a database for later recall or used directly in a subsequent pair-finding node.
A variety of fragmentation options are included:
- "All acyclic single bonds" - Any acyclic single bond between any two atoms will be broken. This is the most exhaustive approach, but can generate a large number of pairs (rSMARTS: [*:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
- "Only acyclic single bonds to rings" - Single acyclic bonds between any atoms will be broken, as long as at least one atom is in a ring (rSMARTS: [*;R:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*]).
- "Only acyclic single bonds to either rings or to double bonds exocyclic to rings" - single acyclic bonds between any atoms will be broken, as long as 1 atom is either in a ring, or in a double bond exocyclic to a ring, with the other end in the ring (rSMARTS: [*:1]!@!=!#[*;!R0,$(*=!@[*!R0]):2]>>[*:1]-[*].[*:2]-[*])
- "Only single bonds to a heteroatom" - Single acyclic bonds between any two atoms, at least one of which is not Carbon will be broken. Included to mirror C-X bond breaking chemistry prevalent in modern drug discovery (e.g. SNAr, Reductive Aminations, Amide formations etc. See Ref. 2) (rSMARTS: [!#6:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
- "Non-functional group single bonds" - This reproduces the fragmentation pattern used in the original Hussein/Rea paper (See footnote 24, Ref. 1), and also used in the RDKit Python implementation (Ref 3) (rSMARTS: [#6+0;!$(*=,#[!#6]):1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
- "User defined" - The user needs to provide their own rSMARTS fragmentation definition, following the guidelines below.
Guidelines for Custom rSMARTS Definition
- '>>' is required to separate reactants and products
- Products require '[*]' to occur twice, for the attachment points (the node will handle the tagging of these)
- Reactants and products require exactly two atom mappings, e.g. :1] and :2] (other values could be used).
- The atom mappings must be two different values
- The same atom mappings must be used for reactants and products
Optionally, when only a single cut is made, or connectivity tracking is enabled, context-fingerprints can be generated (one for each attachment point). The fingerprints generated are RDKit Morgan fingerprints, rooted at the attachment point(s) of the fragment key
The algorithm is implemented using the RDKit toolkit.
1. J. Hussain and C Rea, " Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets ", J. Chem. Inf. Model. , 2010, 50 , 339-348 (DOI: 10.1021/ci900450m ).
2. S. D. Roughley and A. M. Jordan " The Medicinal Chemist’s Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates ", J. Med. Chem. , 2011, 54 , 3451-3479 (DOI: 10.1021/jm200187y )
3. G. Landrum " An Overview of RDKit " (http://www.rdkit.org/docs/Overview.html#the-contrib-directory) (section entitled 'mmpa')