Determination of the Abraham model solute descriptors for trans-cinnamic acid from measured solubilities

Professors: Michael H. Abrahm, William E. Acree Jr., Jean-Claude Bradley, Andrew S.I.D. Lang

Abstract

The Abraham model solute descriptors can be used in many applications including the prediction of solubility in the increasing number of solvents that have Abraham parameters. This has obvious benefits to solvent selection processes such as reaction design or recrystallization. Using solubility measurements from the literature and those collected as part of the Open Notebook Solubility Challenge, we have calculated the descriptors for trans-cinnamic acid to be: E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171

Introduction

The solubility (base 10 logarithm of the molar concentration) of a compound in various organic solvents can be approximated using Abraham's general solvation equation
logC_s = logC_w + c + e * E + s * S + a * A + b * B + v * V
where logC_w is the base 10 logarithm of the aqueous solubility; E, S, A, B, and V are the solute specific Abraham descriptors; c, e, s, a, b, and v are the solvent specific Abraham parameters.

Calculating the descriptors

Calculating V
The solute descriptor V, the McGowan characteristic volume, can be calculated directly from structure. It is equal to the McGowan characteristic volume (cubic cm per mol)/100. [1] For trans-cinnamic acid V = 1.171
Calculating E
The solute descriptor E, the excess molar refractivity, is defined to be the molar refraction, MR, of the compound calculated using the McGowan’s volume less the molar refraction of an alkane with the same McGowan volume. That is E = (MR) − 2.83195 V + 0.52553, where MR = 10 (η^2 −1)/(η^2 + 2) V, where η is the refractive index. For solutes that are solid at room temperature, E can be calculated in three main ways:
1. From the ACD Labs predicted molar refractivity available from ChemSpider. Since Abraham descriptors are calculated using the McGowan volume, the ACD Labs value must be converted before use using V (intrinsic) = 0.597 + 0.6823 V. [1] For trans-cinnamic acid E = 1.303
2. From structure by comparing the solute fragment-wise with compounds with known values for E. For the case of trans-cinnamic acid, comparison can be made with ethyl benzoate (E = 0.689), ethyl cinnamate (E = 1.102), and benzoic acid (E = 0.730). This gives E = 1.143 for cinnamic acid.
3. By letting E float. That is, determine E from regression, see 'calculating S, A, B' below.
Calculating S, A, and B
S (dipolarity/polarizability), A (hydrogen bond acidity), and B (hydrogen bond basicity) can be calculated using GLC [2], RP-HPLC [3], or simple linear regression from known solubility data or known logP values. [4] We collected solubility values and logP values from the literature and from Bio-Loom (trial version). All values (mole fraction, mass fraction, mass ratio, and logP) were converted to molarity for ease of comparison. None of the bio-loom logP values were not included in the final regression as they were determined to be in many cases incorrect by several orders of magnitude.

Additional solubility data was calculated in two ways, either using the density method or by using NMR as part of the ONSChallenge (live data: solubility of cinnamic acid in various organic solvents).

The measured solubilities were combined with the others collected from the literature resulting in 74 solute/solvent values (molar concentrations) at temperatures ranging from 19.5C to 28C. The solubility values were all converted to values at 25C using the Buchowski equation with the assumption of miscibility at solute melting point. Multiple measurements for the same solvent were averaged giving a total of 33 solute/solvent values, see the table below (taken 20120209).
solvent
molar concentration
water
0.004
cyclohexane
0.027
hexane
0.045
carbon tetrachloride
0.117
trifluoroethanol
0.168
tetrachloroethylene
0.172
m-xylene
0.201
toluene
0.253
acetonitrile
0.285
benzene
0.303
chlorobenzene
0.314
1,2-dichloroethane
0.362
pentachloroethane
0.374
trichloroethylene
0.408
nitrobenzene
0.429
1-octanol
0.537
diethyl ether
0.575
propyl acetate
0.609
2-butanol
0.705
1-pentanol
0.725
tetrachloroethane
0.746
1-butanol
0.771
ethyl acetate
0.775
2-pentanol
0.922
chloroform
0.961
2-propanol
0.967
1-propanol
0.986
ethanol
1.207
acetone
1.337
methanol
1.401
cyclohexanone
1.437
THF
2.367
DMSO
8.423
Six solvents were identified as solvents where the trans-cinnamic acid may dimerize: cyclohexane, hexane, carbon tetrachloride, m-xylene, toluene, benzene. These solvents were excluded from the regression. Pentachloroethane, tetrachloroethane, tetrachloroethylene, trichloroethylene, and propyl acetate were also excluded as they currently do not have Abraham solvent parameters. One last solvent, trifluoroethanol was excluded because a recent analysis has shown that its coefficients are in need of updating.

Linear regression was performed using R 2.13.0 on the three files E = 1.303, E = 1.143, E = FLOAT with the following code:
# first row contains variable names, comma is separator
# assign the variable id to row names
# note the / instead of \ on mswindows systems
 
mydata <- read.table("RegressionE1.303.csv", header=TRUE, sep=",", row.names="solvent")
 
# Multiple Linear Regression Example
fit <- lm(Y ~ 0 + s + a + b, data=mydata)
 
# show results
summary(fit)
 
# Results: S=1.07324, A=0.37221, B=0.52214
 
mydata <- read.table("RegressionE1.143.csv", header=TRUE, sep=",", row.names="solvent")
 
# Multiple Linear Regression Example
fit <- lm(Y ~ 0 + s + a + b, data=mydata)
 
# show results
summary(fit)
 
# Results: S=1.05364, A=0.37571, B=0.51084
 
mydata <- read.table("RegressionEfloat.csv", header=TRUE, sep=",", row.names="solvent")
 
# Multiple Linear Regression Example
fit <- lm(Y ~ 0 + e + s + a + b, data=mydata)
 
# show results
summary(fit)
 
Residuals:
     Min       1Q   Median       3Q      Max
-0.15769 -0.07673 -0.01604  0.05414  0.26691
 
Coefficients:
  Estimate Std. Error t value Pr(>|t|)
e  0.78045    0.29774   2.621   0.0185 *
s  1.00922    0.06230  16.198 2.40e-11 ***
a  0.38364    0.02522  15.212 6.19e-11 ***
b  0.48524    0.02352  20.627 5.94e-13 ***
 
# Results: E=0.78045, S=1.00922, A=0.38364, B=0.48524
Allowing E to float during the regression resulted in a value for E of E = 0.78045. This is not close to either value for E determined via fragments or by using the predicted value from ChemSpider. We therefore decided to take the value from ChemSpider as the value for E, as we do for all our webservices, giving the following values for the other Abraham descriptors: S = 1.07324, A = 0.37221, B = 0.52214.

Results

Comparing the predicted solubility (using E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171) with the measured values, we find an AAE of 0.09 log units (0.30 M). Clear outliers are chloroform, nitrobenzene and DMSO. Without these outliers the AAE falls to 0.06 log units (0.10 M). The large discrepancies in the outliers may be due more to trans-cinnamic acid lying outside the chemical space for which the Abraham solvent parameters were determined rather than any error in measurement (ResultsE1.303.xlsx).
solvent
LogC_s
Pred LogC_s
AE
C_s
Pred C_s
AE
1 2-dichloroethane
-0.441
-0.363
0.079
0.362
0.434
0.072
1-butanol
-0.113
-0.105
0.008
0.771
0.785
0.014
1-octanol
-0.270
-0.195
0.075
0.537
0.638
0.101
1-pentanol
-0.140
-0.059
0.081
0.725
0.874
0.149
1-propanol
-0.006
-0.042
0.036
0.986
0.907
0.079
2-butanol
-0.152
-0.139
0.013
0.705
0.727
0.022
2-pentanol
-0.035
-0.077
0.042
0.922
0.837
0.085
2-propanol
-0.015
-0.102
0.087
0.967
0.791
0.176
acetone
0.126
0.101
0.026
1.337
1.261
0.076
acetonitrile
-0.545
-0.470
0.075
0.285
0.339
0.054
chlorobenzene
-0.503
-0.630
0.127
0.314
0.234
0.080
chloroform
-0.017
-0.348
0.331
0.961
0.449
0.512
cyclohexanone
0.157
0.158
0.000
1.437
1.438
0.001
diethyl ether
-0.240
-0.173
0.067
0.575
0.671
0.096
DMSO
0.925
0.717
0.208
8.423
5.217
3.206
ethanol
0.082
0.088
0.006
1.207
1.224
0.017
ethyl acetate
-0.111
-0.028
0.082
0.775
0.937
0.162
methanol
0.146
0.060
0.086
1.401
1.149
0.252
nitrobenzene
-0.368
-0.069
0.299
0.429
0.853
0.424
THF
0.374
0.434
0.060
2.367
2.716
0.349

Conclusion

The Abraham general solvation model has been used to determine the Abraham descriptors for trans-cinnamic acid from measured solubility values. These descriptors can be used in turn to predict the solubility of trans-cinnamic acid in the ever increasing list of solvents with known Abraham parameters.

References

[1] Abraham MH and McGowan JC. 1987. The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatograph. Chromatographia, Volume 23, Number 4. pp. 243-246. DOI: 10.1007/BF02311772
[2] Abraham MH, Ibrahim A, Zissimos AM. 2004. The determination of sets of solute descriptors from chromatographic measurements. J Chromatogr A 1037:29–47.
[3] Andreas MZ, Abraham MH, et al. 2002. Calculation of Abraham descriptors from experimental data from seven HPLC systems; evaluation of five different methods of calculation. J. Chem. Soc. Perkin Transactions 2. pp. 2001-2010. ISSN 1472-779X
[4] Abraham MH, et al. 2009. Prediction of Solubility of Drugs and Other Compounds in Organic Solvents. Journal of Pharmaceutical Sciences. DOI: 10.1002/jps.21922