SolubilityModel005

Solubility of Carboxylic Acids in Methanol (Last Updated: June 11, 2010)
Researcher: Andrew Lang Data is from the Open Notebook Science Challenge: [|SolubilitiesSum2010-05-17.xls]

Since some molecules have multiple measured solubilities, repeated measurements were aggregated by taking the mean value. Included all solutes with measured solubility values in methanol that are solid at room temperature. Excluded inorganics and entries marked with DONOTUSE.

This left 47 solutes: [|AllDataWithDescriptors.xlsx]

The descriptors were calculated using Bioclipse Version: 2.4.0.RC1. Included CDK REST descriptors only. A random forests was created using R to determine the top 30 most important descriptors.



Models were built using linear regression with forward stepwise selection of descriptors and a 10-fold crss validation was run using R.



The summary of the 12 descriptor model is below: code Residuals: Min     1Q  Median      3Q     Max -0.5428 -0.1653 -0.0197 0.2331  0.4971

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.42744    0.30432    4.69  4.3e-05 *** VCCalogp    0.42593    0.06449    6.60  1.4e-07 *** VCClogs    -0.30819    0.14562   -2.12  0.04171 * nAtomP     -0.17012    0.02573   -6.61  1.4e-07 *** ATSm2       0.24124    0.10687    2.26  0.03053 * ATSm3      -0.14630    0.05001   -2.93  0.00609 ** ATSc2      -3.57948    0.99193   -3.61  0.00098 *** ATSp1      -0.04060    0.00758   -5.36  5.9e-06 *** ATSp2       0.03552    0.00669    5.31  6.8e-06 *** AMR         0.03140    0.01047    3.00  0.00504 ** C2SP3       0.14643    0.05700    2.57  0.01476 * SP.4       -0.63930    0.26188   -2.44  0.02000 * VP.6       -4.95101    0.88413   -5.60  2.9e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.322 on 34 degrees of freedom Multiple R-squared: 0.875,     Adjusted R-squared: 0.831 F-statistic: 19.9 on 12 and 34 DF, p-value: 6.14e-12

code

The results of actual vs. predicted can be found in this [|summary spreadsheet], see figure below.



Conclusion
The above model can be used to make predictions for carboxylic with unknown methanol solubility.