Recalculating Solvent Coefficient

Researchers: Jean-Claude Bradley, Michael H Abraham, William E Acree, Jr., and Andrew SID Lang
All content, models and data are released as CC0

Objective

To recalculate the Abraham model solvent coefficients when requiring that the c-coefficient equal zero and then create and publish models predicting said coefficients. See model003 for background.

Procedure

The Abraham general solvation model uses the LFER

log P = c + e E + s S + a A + b B + v V

where c,e,s,a,b,v are the solvent coefficients and E,S,A,B,V are the solute descriptors. The Abraham coefficients are found via linear regression from measured data. The standard procedure is to allow the c-coefficient (the intercept) to float in the linear regression. We suggest that little predictive ability will be lost if we just require c to be zero. This will also allow easier comparison between solvents. Thus in order to compare both current solvents with each other and potential new solvents with current solvents, we decided to re-calculate the coefficients for known solvents e_0, s_0, a_0, b_0, v_0 by making c zero. This was achieved by calculating the log P values in over 90 solvents for ???? compounds with known Abraham descriptors from our ((figshare database?????)) and then re-running the linear regression using R. The following code with results is typical:
setwd(".../MakingCZero")
mydata = read.csv(file="makingczeroreadyforR.csv",head=TRUE,row.names="csid")
fit <- lm(isopropyl.myristate ~ 0 + E + S + A + B + V,data=mydata)
## summary of fit
summary(fit)
 
[output]
 
UPDATE BELOW.........................
 
Call:
lm(formula = isopropyl.myristate ~ 0 + E + S + A + B + V, data = mydata)
 
Residuals:
     Min       1Q   Median       3Q      Max
-0.55191 -0.25598 -0.13732  0.00069  1.78549
 
Coefficients:
   Estimate Std. Error t value Pr(>|t|)
E  0.977259   0.011781   82.95   <2e-16 ***
S -1.294959   0.014814  -87.41   <2e-16 ***
A -1.870114   0.020493  -91.26   <2e-16 ***
B -4.017729   0.015120 -265.73   <2e-16 ***
V  3.939081   0.007844  502.19   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 0.2503 on 2139 degrees of freedom
Multiple R-squared:  0.9958,    Adjusted R-squared:  0.9958
F-statistic: 1.009e+05 on 5 and 2139 DF,  p-value: < 2.2e-16
[output]
The following table lists the original solvent coefficients together with the c=0 adjusted coefficients. Not surprisingly, the largest changes in coefficient values occur for solvents with c-values furthest away from zero. What is a little intriguing is that all the coefficients move consistently that same way.

UPDATE BELOW...............

That is, solvents with negative c-values all saw an increase in e and b (and a decrease in s,a, and v) when recalculation was performed, whereas solvents with positive c-values all saw an increase in s,a, and v (and decrease in e and b). By multiplying the average absolute deviation by the average descriptor value gives a measure of the degree by which the coefficients were changed. The adjusted coefficients changed (as measured by e.g. AAE(v_0) * Mean(V)) in the order v (0.124), s (0.043), e (0.013), b (0.011), a (0.010).