ASM002b

Aqueous Solubility Model 002b

 * Researcher: Lois Hard**

**Objective**
To create a general open model predicting the aqueous solubility of organic compounds using open descriptors.

**Data**
Conversions: The aqueous solubility descriptors were converted to canonical smiles; Deletions: 1 Salt, all NA's and zero descriptors were deleted; Additions: The logS was copied to the canonica smies descriptors.

**Procedure**
Download/Installation Packages: Caret, Foreach, Random Forest, Iterators, Plyr

Descriptor Calculations: Performed by Rajarshi Guha’s CDK descriptor calculator GUI.

Results: Library ("caret") [1] 26 97 25 47 95 46 30 27 48 28 90 52 53 49 45 54 44 50 55 88 11 56 61 57 58 [26] 63 64 38 12

Call: randomForest(formula = logS ~ ., data = mydata, importance = TRUE) Type of random forest: regression Number of trees: 500 No. of variables tried at each split: 32

Mean of squared residuals: 0.730993 % Var explained: 81.76

Conclusion
Random Forest calculations retained the most important permutations; and, A002b data set variance of 81.7% is within an acceptable range for the model.