ADlogP001

An open logP model based upon Abraham Descriptors

 * Researchers:** Jean-Claude Bradley, William E Acree, Jr., Andrew SID Lang

Background
Linear free energy relationships (LFERs) are used to model numerous physical/chemical absorption/partition-type properties. Many of the published relationships use Abraham Descriptors as their solute descriptors. Thus given a LFER, we can use our open model for Abraham Descriptors to predict many absorption/partition properties. For example, given a LFER for logP we can use our model for Abraham Descriptors to predict logP.

Procedure
Abraham //et. al.,// published a relationship for logP based upon 613 compounds with known Abraham solute descriptors. [1]

logP = 0.088 + 0.562E - 1.054S + 0.034A - 3.460B + 3.814V (1)

More recently, Mannhold //et. al.// published 'updated' coefficients: [2]


 * log P = 0.395 + 0.738E - 0.586S - 0.338A - 2.972B + 2.74V (2)**

To see why these models had such different coefficients, we developed our own relationships using a subset of our Open Data which had logP values in the LOGKOW database. First, using only e, s, a, b, and v (assuming c has no physical/chemical meaning), we obtained

logP = 0.00 + 0.579E - 0.730S - 0.227A - 3.300B + 3.543V (3)

which is comparable to equation 1. Second, we let c float and obtained:

logP = 0.432 + 0.669E - 0.815S - 0.305A - 3.351B + 3.247V (4)

which is comparable to equation 2. Thus even though equations 1 and 2 may have very different regression coefficients, they seem to be roughly equivalent.

To test the predictive ability of these equations, we tested them on two independent (kindly provided by Igor Tetko). These are the same test-sets used in Mannhold //et. al.// to test 36 commonly used logP models. The results are presented in the table below, showing equation 2 as the one to use.
 * equation || TS1 RMSE || TS1 R2 || TS2 RMSE || TS2 R2 ||
 * 1 || 1.33 || 0.34 || 1.57 || 0.46 ||
 * 2 || 1.05 || 0.57 || 1.39 || 0.57 ||
 * 3 || 1.35 || 0.32 || 1.50 || 0.50 ||
 * 4 || 1.08 || 0.56 || 1.46 || 0.53 ||

Conclusion
When comparing the results listed in the table above with the methods tested by Mannhold //et. al.//, we see that using equation 2 with predicted Abraham Descriptors gives a second (out of three) tier model. That is, using the linear relationship given in equation 2 with predicted Abraham Descriptors gives reasonable logP values but they are not as good as some proprietary non-linear models such as ACDLogP and ALOGPS. The advantage of our model is that is it an Open Model based upon Open Data and Open CDK Descriptors and is released under a CC0 license, though we do note that both ACDLogP values and ALOGPS values are freely available from ChemSpider and VCCLAB respectively.

Interestingly, the equations with a non-zero c-value perform better on the independent test-set. This suggests that some physical/chemical process is being picked up by the c-coefficient. This has been suggested previously by Van Noort, [3] who suggest that the c-coefficient if related to the "solvent packing density," whereas our analysis finds a correlation between the c-coefficient and the number of hydrogen bond donors and the solvent's molecular branching.

Update 20120716: Post-processing of the data led to the discovery that several of the testset compounds are salts. This means that the predictive ability of the method should be slightly better than listed above, though still not as good as nonlinear methods.