This workflow snippet demonstrates how to train a bioactivity model using chemical structures. From the chemical structure we generate hashed bit-based fingerprints. Those fingerprints serve as an input for the Random Forest model. The model is trained on a part of the data set (training data set). For the remaining data (test data set) the model is applied and the predictions are evaluated using the ROC Curve node and the Scorer node in a composite view.
The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.
More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19.
Workflow
Machine Learning Chemistry
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.1
Legal
By using or downloading the workflow, you agree to our terms and conditions.