An open logP model based upon Abraham Descriptors

Researchers: Jean-Claude Bradley, William E Acree, Jr., Andrew SID Lang

Background

Linear free energy relationships (LFERs) are used to model numerous physical/chemical absorption/partition-type properties. Many of the published relationships use Abraham Descriptors as their solute descriptors. Thus given a LFER, we can use our open model for Abraham Descriptors to predict many absorption/partition properties. For example, given a LFER for logP we can use our model for Abraham Descriptors to predict logP.

Procedure

Abraham et. al., published a relationship for logP based upon 613 compounds with known Abraham solute descriptors. [1]

logP = 0.088 + 0.562E - 1.054S + 0.034A - 3.460B + 3.814V (1)

More recently, Mannhold et. al. published 'updated' coefficients: [2]

log P = 0.395 + 0.738E - 0.586S - 0.338A - 2.972B + 2.74V (2)

To see why these models had such different coefficients, we developed our own relationships using a subset of our Open Data (compounds with Abraham Descriptors) which had logP values in the LOGKOW database. First, using only e, s, a, b, and v (assuming c has no physical/chemical meaning), we obtained

logP = 0.00 + 0.579E - 0.730S - 0.227A - 3.300B + 3.543V (3)

which is comparable to equation 1. Second, we let c float and obtained:

logP = 0.432 + 0.669E - 0.815S - 0.305A - 3.351B + 3.247V (4)

which is comparable to equation 2. Thus even though equations 1 and 2 may have very different regression coefficients, they seem to be roughly equivalent.

To test the predictive ability of these equations, we tested them on two independent test sets (kindly provided by Igor Tetko). These are the same test-sets used in Mannhold et. al. to test 36 commonly used logP models. The results are presented in the table below, showing equation 2 as the one to use.
equation
TS1 RMSE
TS1 R2
TS2 RMSE
TS2 R2
1
1.33
0.34
1.57
0.46
2
1.05
0.57
1.39
0.57
3
1.35
0.32
1.50
0.50
4
1.08
0.56
1.46
0.53

Conclusion

When comparing the results listed in the table above with the methods tested by Mannhold et. al., we see that using equation 2 with predicted Abraham Descriptors gives a second (out of three) tier model. That is, using the linear relationship given in equation 2 with predicted Abraham Descriptors gives reasonable logP values but they are not as good as some proprietary non-linear models such as ACDLogP and ALOGPS. The advantage of our model is that is it an Open Model based upon Open Data and Open CDK Descriptors and is released under a CC0 license, though we do note that both ACDLogP values and ALOGPS values are freely available from ChemSpider and VCCLAB respectively.

Interestingly, the equations with a non-zero c-value perform better on the independent test-set. This suggests that some physical/chemical process is being picked up by the c-coefficient. This has been suggested previously by Van Noort, [3] who suggest that the c-coefficient if related to the "solvent packing density," whereas our analysis finds a correlation between the c-coefficient and the number of hydrogen bond donors and the solvent's molecular branching.

Update 20120716: Post-processing of the data led to the discovery that several of the testset compounds are salts. This means that the predictive ability of the method should be slightly better than listed above, though still not as good as nonlinear methods.

References

1. M. H. Abraham, H. S. Chadha, G. S. Whiting and R. C. Mitchell, J. Pharm. Sci., 83, 1085-1100 (1994) DOI: 10.1002/jps.2600830806
2. Raimund Mannhold, Gennadiy I. Poda, Claude Ostermann, and Igor V. Tetko. Calculation of Molecular Lipophilicity: State of the Art and Comparison of Log P Methods on More Than 96000 Compounds. DOI: 10.1002/jps.21494 (pdf)
3. Paul C.M. van Noort. Solvation thermodynamics and the physical–chemical meaning of the constant in Abraham solvation equations. Chemosphere (2011), doi:10.1016/j.chemosphere.2011.11.073