To create two separate random forest models for each of the Abraham model solvent coefficients c, e, s, a, b, and v. One model for a general diverse set of solvents (after outliers have been removed) and one model for a limited set of solvents. All content, models and data are released as CC0 - the default license for all ONS work.

Procedure

Model002a: Our previous analysis reduced our set of 78 solvents with CDK-descriptors, calculated using Rajarshi Guha's CDK Descriptor Calculator, to one of 63 solvents by eliminating four solvents that lay outside of the main chemical space (tributyl phosphate, isopropyl myristate, hexadecane, and octadecanol), 7 solvents with c-coefficient less than -0.058 (DMF, dimethylacetamide, ethylene glycol, nitrobenzene, DMSO, iodobenzene, and formamide) under the assumption that solvents should not have negative values for c, [1] and 4 additional solvents identified as consensus (over all coefficients) model outliers (trifluoroethanol, carbon disulfide, nitromethane, and N-formylmorpholine). Model002a-Chemical Space: The chemical space shown below was created using Tableau Public v7.0 with principal components calculated using the following code in R 2.13.0:

## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorsForPCA.csv",head=TRUE,row.names="molID")
data <- mydata[, 1:129] ## Don't include molID in pca
pc1 <- prcomp(data, scale. = T)
x <- pc1$x
summary(pc1)
[output]
Importance of components:
PC1 PC2 PC3
Standard deviation 6.2811 4.3032 3.47059
Proportion of Variance 0.3058 0.1436 0.09337
Cumulative Proportion 0.3058 0.4494 0.54275
[output]

The colours in the above chemical space correspond to the third principal component. Model002a-RFc: A random forest model for the c-coefficient was created using the following R code:

library(randomForest)
## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorsc.csv",head=TRUE,row.names="molID")
## do random forest [randomForest 4.5-34]
mydata.rf <- randomForest(c ~ ., data = mydata,importance = TRUE)
## Show important descriptors
varImpPlot(mydata.rf)
## Summary of random forest model
print(mydata.rf)
[output]
Call:
randomForest(formula = c ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 43
Mean of squared residuals: 0.006481098
% Var explained: 44.65
[output]
## predict using the random forest model
test.predict <- predict(mydata.rf,mydata)
## write the predictions to the working directory
write.csv(test.predict, file = "RFTestPredict.csv")

The variable importance can be seen in the following plot (nHBDon, WTPT.2, etc.)

The model was exported using the following code and is available for download: crfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created (annotated example) through the command line (clog.txt) as is also available for download: AbrahamSolventc.qdb.zip
The training set R2 for the random forest model 0.9083 (OOB R2 0.4465 - approximate test-set R2).

Model002a-RFe: A random forest model for the e-coefficient was created using the following R code:

library(randomForest)
## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorse.csv",head=TRUE,row.names="molID")
## do random forest [randomForest 4.6-6]
mydata.rf <- randomForest(e ~ ., data = mydata,importance = TRUE)
## Show important descriptors
varImpPlot(mydata.rf)
## Summary of random forest model
print(mydata.rf)
[output]
Call:
randomForest(formula = e ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 43
Mean of squared residuals: 0.01895079
% Var explained: 40.21
[output]
## predict using the random forest model
test.predict <- predict(mydata.rf,mydata)
## write the predictions to the working directory
write.csv(test.predict, file = "RFTestPredict.csv")

The training-set R2 = 0.9019, OOB R2 = 0.4021. The variable importance can be seen in the following plot (XLogP, ATSc1, etc.)

The model was exported using the following code and is available for download: erfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolvente.qdb.zip
The training set R2 value for the random forest model 0.9019 (OOB R2 0.4021 - approximate test-set R2).

Model002a-RFs: A random forest model for the s-coefficient was created using the following R code:

library(randomForest)
## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorss.csv",head=TRUE,row.names="molID")
## do random forest [randomForest 4.6-6]
mydata.rf <- randomForest(s ~ ., data = mydata,importance = TRUE)
## Show important descriptors
varImpPlot(mydata.rf)
## Summary of random forest model
print(mydata.rf)
[output]
Call:
randomForest(formula = s ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 43
Mean of squared residuals: 0.07565735
% Var explained: 81.47
[output]
## predict using the random forest model
test.predict <- predict(mydata.rf,mydata)
## write the predictions to the working directory
write.csv(test.predict, file = "RFTestPredict.csv")

The training-set R2 = 0.9700, OOB R2 = 0.8147. The variable importance can be seen in the following plot (ATSc1, nAtomP, etc.)

The model was exported using the following code and is available for download: srfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolvents.qdb.zip

Model002a-RFa: A random forest model for the a-coefficient was created using the following R code:

library(randomForest)
## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorsa.csv",head=TRUE,row.names="molID")
## do random forest [randomForest 4.6-6]
mydata.rf <- randomForest(a ~ ., data = mydata,importance = TRUE)
## Show important descriptors
varImpPlot(mydata.rf)
## Summary of random forest model
print(mydata.rf)
[output]
Call:
randomForest(formula = a ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 43
Mean of squared residuals: 0.1258657
% Var explained: 95.6
[output]
## predict using the random forest model
test.predict <- predict(mydata.rf,mydata)
## write the predictions to the working directory
write.csv(test.predict, file = "RFTestPredict.csv")

The training-set R2 = 0.9924, OOB R2 = 0.9560. The variable importance can be seen in the following plot (nHBAcc, TopoPSA, WTPT.4, etc.)

The model was exported using the following code and is available for download: arfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolventa.qdb.zip

Model002a-RFb: A random forest model for the b-coefficient was created using the following R code:

library(randomForest)
## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorsb.csv",head=TRUE,row.names="molID")
## do random forest [randomForest 4.6-6]
mydata.rf <- randomForest(b ~ ., data = mydata,importance = TRUE)
## Show important descriptors
varImpPlot(mydata.rf)
## Summary of random forest model
print(mydata.rf)
[output]
Call:
randomForest(formula = b ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 43
Mean of squared residuals: 0.05920697
% Var explained: 74.64
[output]
## predict using the random forest model
test.predict <- predict(mydata.rf,mydata)
## write the predictions to the working directory
write.csv(test.predict, file = "RFTestPredict.csv")

The training-set R2 = 0.9517, OOB R2 = 0.7464. The variable importance can be seen in the following plot (khs.sOH, nHBDon, etc.)

The model was exported using the following code and is available for download: brfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolventb.qdb.zip

Model002a-RFv: A random forest model for the v-coefficient was created using the following R code:

library(randomForest)
## load in data
mydata = read.csv(file="20120126SolventsWithDescriptorsv.csv",head=TRUE,row.names="molID")
## do random forest [randomForest 4.6-6]
mydata.rf <- randomForest(v ~ ., data = mydata,importance = TRUE)
## Show important descriptors
varImpPlot(mydata.rf)
## Summary of random forest model
print(mydata.rf)
[output]
Call:
randomForest(formula = v ~ ., data = mydata, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 43
Mean of squared residuals: 0.03566637
% Var explained: 68.16
[output]
## predict using the random forest model
test.predict <- predict(mydata.rf,mydata)
## write the predictions to the working directory
write.csv(test.predict, file = "RFTestPredict.csv")

The training-set R2 = 0.9483, OOB R2 = 0.6816. The variable importance can be seen in the following plot (TopoPSA, XLogP, etc.)

The model was exported using the following code and is available for download: vrfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolventv.qdb.zip

Results - Model002a (General)

The following table summarizes the results, including the Training Set R2 (R2) and the Out of Bag R2 (OOB R2) which is a good estimate of utility.

parameter

R2

OOB R2

meaning[1,2]

ImpVar

c

0.91

0.45

log of the solvent–water ratio of the volume packing densities

nHBDon,WTPT.2

e

0.90

0.40

general solute-solvent dispersion interactions

XLogP, ATSc1

s

0.97

0.81

dipole-dipole and dipole-induced interactions

ATSc1, nAtomP

a

0.99

0.96

hydrogen-bond basicity

nHBAcc, TopoPSA, WTPT.4

b

0.95

0.75

hydrogen-bond acidity

khs.sOH, nHBDon

v

0.95

0.68

associated with cavity formation

TopoPSA, XLogP

The models were used to predict the parameters of all the solvents in the test-set and a consensus measure of accuracy was attained by summing the squares of the standardized absolute errors across all parameters. Using this measure we identify the model outliers to be acetonitrile and sulfolane. The chart below, showing the consensus error for each solvent, also gives information as to how the models perform on the different classes on solvents. The models perform better for compounds that only contain carbon (non-aliphatic rings), oxygen and hydrogen atoms.

Predicting the Abraham parameters for new solvents

1863 compounds with a melting point between -100 C and 0 C were taken from the most recent Open melting point dataset (ONSMP031). All CDK descriptors (except CPSA, Ionization Potential, WHIM, all Protein descriptors, and all Geometrical descriptors) were calculated using Rajarshi Guha's CDK Descriptor Calculator v1.3.2. The descriptors Kier3 and HybRatio were removed post-calculation due to a significant number of NA values. The solvent parameters for 1863 compounds were then calculated using the random forest models above, see the following R code:

The Abraham parameters were then used to predict the 25C-solubility of trans-cinnamic acid in all 1863 solvents using the latest Abraham descriptors for trans-cinnamic acid.

Model002b

Here we limit our models to solvents that only contain carbon, oxygen, and hydrogen atoms. We started with the original list of 78 solvents and then removed solvents that were not comprised of only carbon, oxygen, and hydrogen atoms (1,2-dichloroethane, 1-chlorobutane, acetonitrile, benzonitrile, bromobenzene, carbon disulfide, carbon tetrachloride, chlorobenzene, chloroform, dibutylformamide, dichloromethane, diethylacetamide, dimethylacetamide, DMF, DMSO, fluorobenzene, formamide, iodobenzene, N-ethylacetamide, N-ethylformamide, N-formylmorpholine, nitrobenzene, nitromethane, N-methyl-2-piperidone, N-methylacetamide, N-methylformamide, N-methylpyrrolidinone, sulfolane, tributyl phosphate, trifluoroethanol), This left us with 48 solvents.

Removing Outliers: Some of the 48 solvents were removed as outliers previously but are considered here, including solvents with negative c-parameter, thus before modeling we proceed by removing a new set of outliers exclusively identified step-wise using our consensus measure outlined above listed with the most suspect parameter: ethylene glycol (c = 2.729), isopropyl myristate (c = -0.605), propylene carbonate (v = 3.412), octadecanol (e = 0.148), methylcyclohexane (e = 0.782), cyclohexane (v = 4.577), cyclohexanone (e = 0.225), methanol (v = 3.549). This left us with 40 solvents.

Model002b-RandomForest: Random forest models were created for the Abraham solvent parameters as above. The following table summarizes the results, including the Training Set R2 (R2) and the Out of Bag R2 (OOB R2) which is a good estimate of utility.

parameter

R2

OOB R2

meaning[1,2]

ImpVar

c

0.93

0.61

log of the solvent–water ratio of the volume packing densities

khs.sOH, nHBDon

e

0.94

0.65

general solute-solvent dispersion interactions

MDEC.12, nAtomLAC

s

0.97

0.81

dipole-dipole and dipole-induced interactions

nAtomLAC, ATSc1

a

0.99

0.98

hydrogen-bond basicity

nHBDon, ATSc2

b

0.98

0.90

hydrogen-bond acidity

khs.sOH, nHBDon

v

0.96

0.77

associated with cavity formation

TopoPSA, BCUTp.1l

The chart below, showing the consensus error for each solvent, also gives information as to how the models perform on the different classes on solvents within the set. The models seems to perform worse, in general, for compounds with aliphatic rings.

The predicted Abraham parameters were compared to those from model002a and it was found that no statistical improvement was gained when restricting to a smaller subset. Thus since model002a is built using a more diverse set of solvents, it is recommended as the model of choice to predict the Abraham parameters for new solvents.

Classifying Solvents

We have derived models for predicting the Abraham solvent parameters that have OOB R2-values ranging from 0.40 to 0.96. Using these models we provide a combined list of 1939 solvents with Abraham parameters (88 measured-20120218 and 1851 predicted) partitioned into 12 groups using K-means clustering:

aprotic dipolar (82%), weak electron pair donor bases (18%)

Looking at the above table, we see two interesting results: 1. Group one contains no predicted solvents, where all the other groups have over 100. This means the solvents of group one are atypical/special. 2. Group five has 245 predicted solvents but no solvents with measured Abraham parameters. This suggests that a whole entire region of the chemical space has yet to be explored. The predicted solvents of group five a typified by compounds like: ethyl undec-10-enoate, trielaidoylglycerol, octyl acetate, aramite, beta-ionone, beta-butoxyethyl phthalate, 6-methylheptyl 2-(2,4-dichlorophenoxy)propanoate, 1,1,5-Trimethyl-3,3-bis[(2-methyl-2-propanyl)peroxy]cyclohexane. The compounds of group five are most similar to those of group group six, except in that they have a much larger values for 'e'.

Classifying the groups in classical terms (polar, apolar, etc) by comparing the groups to those derived by Durand et. al. [3], we see a good correlation between the 12 groups and classical solvent groupings with group 1 being polar, groups 2, 3, and 11 being apolar, group 4 being assymetric halgonated hydrocarbons, group 5 being weak electron pair donor bases, groups 6, 9, and 12 being dipolar, group 7 being polar protic, group 8 being amphiprotic; and group 10 being a mixture of aprotic dipolar and apolar compounds but on closer examination, we see that group 10 consists entirely of substituted benzene rings. Perfect correlation cannot be expected because our partition consists of 12 groups whereas Durand's consists of 10. Also, our method of partitioning may focus on slightly different structure properties, group 10 for example which consists entirely of substituted benzene rings, yet Durand splits the group into two.

Using principal component analysis on the full set of solvents we can see the groupings in chemical space. This graph is interactive - hover over the data point to see info - select multiple classes using ctrl-click. We see that distinct lines are hard to draw and compounds situated on the edge of two or more groups will share similarities to surrounding compounds, even if they are from different groups.

[1] Paul C.M. van Noort. Solvation thermodynamics and the physical–chemical meaning of the constant in Abraham solvation equations. Chemosphere (2011), doi:10.1016/j.chemosphere.2011.11.073
[2] Laura M. Grubbs, Mariam Saifullah, Nohelli E. De La Rosa, Shulin Ye, Sai S. Achi, William E. Acree Jr., Michael H. Abraham. Mathematical correlations for describing solute transfer into functionalized alkane solvents containing hydroxyl, ether, ester or ketone solvents. Fluid Phase Equilibria 298 (2010) 48–53, doi:10.1016/j.fluid.2010.07.00
[3] Morgan Durand,Valrie Molinier, Werner Kunz, and Jean-Marie Aubry. Classification of Organic Solvents Revisited by Using the COSMO-RS
Approach. Chem. Eur. J. (2011), doi:10.1002/chem.201001743

## CDK-Based Solvent Abraham Coefficient Models

Researcher:Andrew Lang## Objective

To create two separate random forest models for each of the Abraham model solvent coefficients c, e, s, a, b, and v. One model for a general diverse set of solvents (after outliers have been removed) and one model for a limited set of solvents. All content, models and data are released as CC0 - the default license for all ONS work.## Procedure

Model002a:Our previous analysis reduced our set of 78 solvents with CDK-descriptors, calculated using Rajarshi Guha's CDK Descriptor Calculator, to one of 63 solvents by eliminating four solvents that lay outside of the main chemical space (tributyl phosphate, isopropyl myristate, hexadecane, and octadecanol), 7 solvents with c-coefficient less than -0.058 (DMF, dimethylacetamide, ethylene glycol, nitrobenzene, DMSO, iodobenzene, and formamide) under the assumption that solvents should not have negative values for c, [1] and 4 additional solvents identified as consensus (over all coefficients) model outliers (trifluoroethanol, carbon disulfide, nitromethane, and N-formylmorpholine).Model002a-Chemical Space:The chemical space shown below was created using Tableau Public v7.0 with principal components calculated using the following code in R 2.13.0:The colours in the above chemical space correspond to the third principal component.

Model002a-RFc:A random forest model for the c-coefficient was created using the following R code:The variable importance can be seen in the following plot (nHBDon, WTPT.2, etc.)

The model was exported using the following code and is available for download: crfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created (annotated example) through the command line (clog.txt) as is also available for download: AbrahamSolventc.qdb.zip

The training set R2 for the random forest model 0.9083 (OOB R2 0.4465 - approximate test-set R2).

Model002a-RFe:A random forest model for the e-coefficient was created using the following R code:The training-set R2 = 0.9019, OOB R2 = 0.4021. The variable importance can be seen in the following plot (XLogP, ATSc1, etc.)

The model was exported using the following code and is available for download: erfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolvente.qdb.zip

The training set R2 value for the random forest model 0.9019 (OOB R2 0.4021 - approximate test-set R2).

Model002a-RFs:A random forest model for the s-coefficient was created using the following R code:The training-set R2 = 0.9700, OOB R2 = 0.8147. The variable importance can be seen in the following plot (ATSc1, nAtomP, etc.)

The model was exported using the following code and is available for download: srfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolvents.qdb.zip

Model002a-RFa:A random forest model for the a-coefficient was created using the following R code:The training-set R2 = 0.9924, OOB R2 = 0.9560. The variable importance can be seen in the following plot (nHBAcc, TopoPSA, WTPT.4, etc.)

The model was exported using the following code and is available for download: arfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolventa.qdb.zip

Model002a-RFb:A random forest model for the b-coefficient was created using the following R code:The training-set R2 = 0.9517, OOB R2 = 0.7464. The variable importance can be seen in the following plot (khs.sOH, nHBDon, etc.)

The model was exported using the following code and is available for download: brfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolventb.qdb.zip

Model002a-RFv:A random forest model for the v-coefficient was created using the following R code:The training-set R2 = 0.9483, OOB R2 = 0.6816. The variable importance can be seen in the following plot (TopoPSA, XLogP, etc.)

The model was exported using the following code and is available for download: vrfmodel

A plot of predicted values vs. measured values is shown below coloured from green to red by AE.

A QSARDB model was created through the command line and is also available for download: AbrahamSolventv.qdb.zip

## Results - Model002a (General)

The following table summarizes the results, including the Training Set R2 (R2) and the Out of Bag R2 (OOB R2) which is a good estimate of utility.## Predicting the Abraham parameters for new solvents

1863 compounds with a melting point between -100 C and 0 C were taken from the most recent Open melting point dataset (ONSMP031). All CDK descriptors (except CPSA, Ionization Potential, WHIM, all Protein descriptors, and all Geometrical descriptors) were calculated using Rajarshi Guha's CDK Descriptor Calculator v1.3.2. The descriptors Kier3 and HybRatio were removed post-calculation due to a significant number of NA values. The solvent parameters for 1863 compounds were then calculated using the random forest models above, see the following R code:The Abraham parameters were then used to predict the 25C-solubility of trans-cinnamic acid in all 1863 solvents using the latest Abraham descriptors for trans-cinnamic acid.

## Model002b

Here we limit our models to solvents that only contain carbon, oxygen, and hydrogen atoms. We started with the original list of 78 solvents and then removed solvents that were not comprised of only carbon, oxygen, and hydrogen atoms (1,2-dichloroethane, 1-chlorobutane, acetonitrile, benzonitrile, bromobenzene, carbon disulfide, carbon tetrachloride, chlorobenzene, chloroform, dibutylformamide, dichloromethane, diethylacetamide, dimethylacetamide, DMF, DMSO, fluorobenzene, formamide, iodobenzene, N-ethylacetamide, N-ethylformamide, N-formylmorpholine, nitrobenzene, nitromethane, N-methyl-2-piperidone, N-methylacetamide, N-methylformamide, N-methylpyrrolidinone, sulfolane, tributyl phosphate, trifluoroethanol), This left us with 48 solvents.Removing Outliers:Some of the 48 solvents were removed as outliers previously but are considered here, including solvents with negative c-parameter, thus before modeling we proceed by removing a new set of outliers exclusively identified step-wise using our consensus measure outlined above listed with the most suspect parameter: ethylene glycol (c = 2.729), isopropyl myristate (c = -0.605), propylene carbonate (v = 3.412), octadecanol (e = 0.148), methylcyclohexane (e = 0.782), cyclohexane (v = 4.577), cyclohexanone (e = 0.225), methanol (v = 3.549). This left us with 40 solvents.Model002b-RandomForest:Random forest models were created for the Abraham solvent parameters as above. The following table summarizes the results, including the Training Set R2 (R2) and the Out of Bag R2 (OOB R2) which is a good estimate of utility.The predicted Abraham parameters were compared to those from model002a and it was found that no statistical improvement was gained when restricting to a smaller subset. Thus since model002a is built using a more diverse set of solvents, it is recommended as the model of choice to predict the Abraham parameters for new solvents.

## Classifying Solvents

We have derived models for predicting the Abraham solvent parameters that have OOB R2-values ranging from 0.40 to 0.96. Using these models we provide a combined list of 1939 solvents with Abraham parameters (88 measured-20120218 and 1851 predicted) partitioned into 12 groups using K-means clustering:The 12 groups have the following statistics:

Classifying the groups in classical terms (polar, apolar, etc) by comparing the groups to those derived by Durand

et. al.[3], we see a good correlation between the 12 groups and classical solvent groupings with group 1 being polar, groups 2, 3, and 11 being apolar, group 4 being assymetric halgonated hydrocarbons, group 5 being weak electron pair donor bases, groups 6, 9, and 12 being dipolar, group 7 being polar protic, group 8 being amphiprotic; and group 10 being a mixture of aprotic dipolar and apolar compounds but on closer examination, we see that group 10 consists entirely of substituted benzene rings. Perfect correlation cannot be expected because our partition consists of 12 groups whereas Durand's consists of 10. Also, our method of partitioning may focus on slightly different structure properties, group 10 for example which consists entirely of substituted benzene rings, yet Durand splits the group into two.Using principal component analysis on the full set of solvents we can see the groupings in chemical space. This graph is interactive - hover over the data point to see info - select multiple classes using ctrl-click. We see that distinct lines are hard to draw and compounds situated on the edge of two or more groups will share similarities to surrounding compounds, even if they are from different groups.

## References

[1] Paul C.M. van Noort. Solvation thermodynamics and the physical–chemical meaning of the constant in Abraham solvation equations. Chemosphere (2011), doi:10.1016/j.chemosphere.2011.11.073[2] Laura M. Grubbs, Mariam Saifullah, Nohelli E. De La Rosa, Shulin Ye, Sai S. Achi, William E. Acree Jr., Michael H. Abraham. Mathematical correlations for describing solute transfer into functionalized alkane solvents containing hydroxyl, ether, ester or ketone solvents. Fluid Phase Equilibria 298 (2010) 48–53, doi:10.1016/j.fluid.2010.07.00

[3] Morgan Durand,Valrie Molinier, Werner Kunz, and Jean-Marie Aubry. Classification of Organic Solvents Revisited by Using the COSMO-RS

Approach. Chem. Eur. J. (2011), doi:10.1002/chem.201001743