Fingerprint Bayesian Learner


(Variant) of Naive Bayes for fingerprint columns, i.e. bitvectors. The learner implements a Naive Bayes like algorithm that incorporates sparsely occupied bits and unbalanced class distributions. Details of the algorithm are described in

Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases, Nidhi Meir Glick, John W. Davies, and Jeremy L. Jenkins, J. Chem. Inf. Model., 2006, 46 (3), pp 1124–1133

Input Ports

  1. Type: Data The data to learn from. It needs to contain a fingerprint column and a categorical class column.

Output Ports

  1. Type: Data A table containing the scores of the training data, whereby each row is predicted using a model trained on the n-1 remaining rows (leave-one-out). The table is sorted by descending score; it contains the following columns: <ol> <li>The true class values (copied from the input data).</li> <li>The leave-one-out score (the sum-of-logs of the on-bits)).</li> <li>The running error of the target class, i.e. the error on the training data if the current row and all preceding rows were predicted as positive class (as they have a score larger or equal to the row's score). </li> <li>The running error on the negative class(es), i.e. if all rows below the current line were predicted as negative. </li> </ol> The threshold that minimizes the sum of both error rates is used as default cutoff in the predictor. <br> Note, these scores could also be determined using a Cross-Validation meta node. However, they are provided here as they can be easily computed in a single scan on the training data (as opposed to an expensive cross validation run).<br> This table can be very well visualized using a ROC Curve node.
  2. Type: Data A table representing each bit's importance on the different classes. The table has as many rows as there are bits in the fingerprint. The columns show for each bit position, how often a bit is set in (i) any of the rows and (ii) in rows of the respective target class. The value of the "logP" column is the logarithm of equation (6) in the above cited article. A value smaller than 0 indicates that the bit is uncharacteristic for the target class, a value larger 0 shows a strong characteristic for that bit and class. A value ~0 indicates that there is no or a weak relationship between the bit and the class.
  3. Type: Fingerprint Bayes The model; it's the input to the predictor node.

Find here

Chemistry > Mining

Make sure to have this extension installed: