Standardizing Molecular Structures
This workflow snippet shows how to standardize chemical structures in SMILES format using the open-source RDKit nodes.
The steps of standardization and data cleaning comprise
1. the removal of hydrogens
2. the removal of solvents
3. the stripping of salts
4. structure normalization
5. canonicalization
Please note that while we read in the molecules as a KNIME-native table, this is also applicable to data of all kind of formats read in with other readers, e.g. SMILES, SDF or Mol. We remove explicit hydrogens here in the first step for the sake of demonstration, but this is actually done under the hood by any RDKit node. The Salt Stripper node is used twice, once to remove any user-given solvents, and once to remove pre-defined salts. Note that the removal of salts could also be done with the Structure Normalizer node. The canonicalization constitutes the last step in this workflow.
The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19
Workflow
Standardizing Molecular Structures
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 5.3.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.