Skip to main content
Get your brand new Wikispaces Classroom now
and do "back to school" in style.
Pages and Files
list of experiments
Solubility book (3rd Edn)
A Quick Rule of Thumb for the Domain of Applicability of ADModel003
To provide a simple rule of thumbs for the domain of applicability for Abraham Descriptor Model 003 -
Starting with the same dataset as ADModel003, we only kept molecules that had measured values for all Abraham descriptors: E, S, A, B, and V. Then we calculated predicted values for all the descriptors using the RF models from ADModel003. Since the model predicts 5 different descriptors - each with a different standard deviation - the linear distance, between the measured descriptors and the predicted descriptors, in 5D-space, was calculated after first dividing all measured and predicted values for each descriptor by the standard deviation of the measured values for each descriptor. See this
for calculation details. The gives a good measure of the prediction error over all descriptors for each molecule.
The DMax Chemistry Assistant was used in a similar manner to
an early exploration of methanol solubility data
in order to find relationships that explain high and low values for the 5D-error. DMax automatically finds "scientific hypotheses that best match measurements of activity (or any other observable property) of small molecules. It also makes a statistical estimate of the confidence you can have in each hypothesis."
The results of the DMax run are presented in the tables below:
Why High Error?
XLogP < -0.03
MLogP < 1.63 AND TopoPSA > 31.8
The compound contains a phenol
The compound contains a hetero atom
A 5-ring is connected to a general functional group by a single bond
Why Low Error?
TopoPSA < 1.62 AND MlogP > 1.74 AND AMR > 8.25
TopoPSA < 26.59
With these results we see that whether you get a large or small error depends significantly of the polar surface area (TopoPSA) and the logarithm of the 1-octanol/water partition coefficient (logP); TopoPSA being more significant - confirmed by creating linear models of the 5D-error versus both TopoPSA and XLogP with corresponding R2 values of 0.5045 and 0.1257 respectively.
Plotting a chemical space using Tableau Public with XLogP and TopoPSA as the x and y coordinates and coloring by 5D-error (red = bad), we see that certain regions of the chemical space correspond to, on average, high errors whereas other regions correspond to, on average, low errors.
By analyzing the regions from the DMax hypotheses and from the geometry of the above figure, we see a quick rule of thumb for the domain of applicability of ADModel003. That being, molecules that have the following properties will significantly, on average, have better predictions, than those that don't have the following properties:
XlogP > 0,
TopoPSA < 45, and
TopoPSA/(10 - XLogP) < 5.
Using this rule of thumb on the original dataset we have the following
Inside DOA? NO!
While our model is currently the best available (as of 2012-11-15), it can sometimes give large errors for certain molecules. Care should be taken when using values for compounds outside the domain of applicability discussed above. Even then, some additional considerations may be needed. For example, DMax suggests that the model does better with molecules with no heteroatoms. In particular there seems to be an issue with compounds that contain a phenol, and to a lesser, though still significant, extent compounds that contain 5-rings connected to a general functional group via a single bond.
1. DMax Chemistry Assistant:
help on how to format text
Turn off "Getting Started"