Hierarchical Clustering based on molecular fingerprints
Available linkage types:
- single
- complete
- average
- centroid
- mcquitty
- ward
- weightedcentroid
- flexiblebeta
- schrodinger
The statsFile contains data relating to the cluster efficiency for each possible number of clusters (n).
Definition of each statistics used in statsFile
R-Squared(RSQ) represents 1.0-(W/T) where:
W is the sum of variance between all n clusters and
T is the total variance
Semipartial R-Squared(SPRSQ) represents the gradient of the above metric.
SPRSQRank is the rank of SPRSQ values over all possible choices of n (for clarity only the top sqrt(n) ranks are listed). Useful for choosing a locally optimal n within a desired range.
Kelley Penalty is Kelley's clustering efficiency metric. (Kelley et al. Protein Engineering (9) 11. pp. 1063-1065(1996))
IsKelleyMinimum represents whether the cluster is the global minimum of the above function. Useful for choosing globally optimal n.
Backend implementation
utilities/canvasHCBuildcanvasHCBuild is used to implement this node.