Aqueous Solubility Model 002b

Researcher: Lois Hard

Objective

To create a general open model predicting the aqueous solubility of organic compounds using open descriptors.

Data

Conversions: The aqueous solubility descriptors were converted to canonical smiles; Deletions: 1 Salt, all NA's and zero descriptors were deleted; Additions: The logS was copied to the canonica smies descriptors.

Procedure

Download/Installation Packages: Caret, Foreach, Random Forest, Iterators, Plyr

Descriptor Calculations: Performed by Rajarshi Guha’s CDK descriptor calculator GUI.

Results:
Library ("caret") [1] 26 97 25 47 95 46 30 27 48 28 90 52 53 49 45 54 44 50 55 88 11 56 61 57 58
[26] 63 64 38 12

Call:
randomForest(formula = logS ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 32

Mean of squared residuals: 0.730993
% Var explained: 81.76

Conclusion

Random Forest calculations retained the most important permutations; and, A002b data set variance of 81.7% is within an acceptable range for the model.