Determination of the Abraham model solute descriptors for trans-cinnamic acid from measured solubilities

Professors: Michael H. Abrahm, William E. Acree Jr., Jean-Claude Bradley, Andrew S.I.D. Lang

Abstract

The Abraham model solute descriptors can be used in many applications including the prediction of solubility in the increasing number of solvents that have Abraham parameters. This has obvious benefits to solvent selection processes such as reaction design or recrystallization. Using solubility measurements from the literature and those collected as part of the Open Notebook Solubility Challenge, we have calculated the descriptors for trans-cinnamic acid to be: E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171

Introduction

The solubility (base 10 logarithm of the molar concentration) of a compound in various organic solvents can be approximated using Abraham's general solvation equation
logC_s = logC_w + c + e * E + s * S + a * A + b * B + v * V
where logC_w is the base 10 logarithm of the aqueous solubility; E, S, A, B, and V are the solute specific Abraham descriptors; c, e, s, a, b, and v are the solvent specific Abraham parameters.

Calculating the descriptors

Calculating V
The solute descriptor V, the McGowan characteristic volume, can be calculated directly from structure. It is equal to the McGowan characteristic volume (cubic cm per mol)/100. [1] For trans-cinnamic acid V = 1.171 Calculating E
The solute descriptor E, the excess molar refractivity, is defined to be the molar refraction, MR, of the compound calculated using the McGowan’s volume less the molar refraction of an alkane with the same McGowan volume. That is E = (MR) − 2.83195 V + 0.52553, where MR = 10 (η^2 −1)/(η^2 + 2) V, where η is the refractive index. For solutes that are solid at room temperature, E can be calculated in three main ways:
1. From the ACD Labs predicted molar refractivity available from ChemSpider. Since Abraham descriptors are calculated using the McGowan volume, the ACD Labs value must be converted before use using V (intrinsic) = 0.597 + 0.6823 V. [1] For trans-cinnamic acid E = 1.303
2. From structure by comparing the solute fragment-wise with compounds with known values for E. For the case of trans-cinnamic acid, comparison can be made with ethyl benzoate (E = 0.689), ethyl cinnamate (E = 1.102), and benzoic acid (E = 0.730). This gives E = 1.143 for cinnamic acid.
3. By letting E float. That is, determine E from regression, see 'calculating S, A, B' below. Calculating S, A, and B
S (dipolarity/polarizability), A (hydrogen bond acidity), and B (hydrogen bond basicity) can be calculated using GLC [2], RP-HPLC [3], or simple linear regression from known solubility data or known logP values. [4] We collected solubility values and logP values from the literature and from Bio-Loom (trial version). All values (mole fraction, mass fraction, mass ratio, and logP) were converted to molarity for ease of comparison. None of the bio-loom logP values were not included in the final regression as they were determined to be in many cases incorrect by several orders of magnitude.

The measured solubilities were combined with the others collected from the literature resulting in 74 solute/solvent values (molar concentrations) at temperatures ranging from 19.5C to 28C. The solubility values were all converted to values at 25C using the Buchowski equation with the assumption of miscibility at solute melting point. Multiple measurements for the same solvent were averaged giving a total of 33 solute/solvent values, see the table below (taken 20120209).

solvent

molar concentration

water

0.004

cyclohexane

0.027

hexane

0.045

carbon tetrachloride

0.117

trifluoroethanol

0.168

tetrachloroethylene

0.172

m-xylene

0.201

toluene

0.253

acetonitrile

0.285

benzene

0.303

chlorobenzene

0.314

1,2-dichloroethane

0.362

pentachloroethane

0.374

trichloroethylene

0.408

nitrobenzene

0.429

1-octanol

0.537

diethyl ether

0.575

propyl acetate

0.609

2-butanol

0.705

1-pentanol

0.725

tetrachloroethane

0.746

1-butanol

0.771

ethyl acetate

0.775

2-pentanol

0.922

chloroform

0.961

2-propanol

0.967

1-propanol

0.986

ethanol

1.207

acetone

1.337

methanol

1.401

cyclohexanone

1.437

THF

2.367

DMSO

8.423

Six solvents were identified as solvents where the trans-cinnamic acid may dimerize: cyclohexane, hexane, carbon tetrachloride, m-xylene, toluene, benzene. These solvents were excluded from the regression. Pentachloroethane, tetrachloroethane, tetrachloroethylene, trichloroethylene, and propyl acetate were also excluded as they currently do not have Abraham solvent parameters. One last solvent, trifluoroethanol was excluded because a recent analysis has shown that its coefficients are in need of updating.

Linear regression was performed using R 2.13.0 on the three files E = 1.303, E = 1.143, E = FLOAT with the following code:

# first row contains variable names, comma is separator
# assign the variable id to row names
# note the / instead of \ on mswindows systems
mydata <- read.table("RegressionE1.303.csv", header=TRUE, sep=",", row.names="solvent")
# Multiple Linear Regression Example
fit <- lm(Y ~ 0 + s + a + b, data=mydata)
# show results
summary(fit)
# Results: S=1.07324, A=0.37221, B=0.52214
mydata <- read.table("RegressionE1.143.csv", header=TRUE, sep=",", row.names="solvent")
# Multiple Linear Regression Example
fit <- lm(Y ~ 0 + s + a + b, data=mydata)
# show results
summary(fit)
# Results: S=1.05364, A=0.37571, B=0.51084
mydata <- read.table("RegressionEfloat.csv", header=TRUE, sep=",", row.names="solvent")
# Multiple Linear Regression Example
fit <- lm(Y ~ 0 + e + s + a + b, data=mydata)
# show results
summary(fit)
Residuals:
Min 1Q Median 3Q Max
-0.15769 -0.07673 -0.01604 0.05414 0.26691
Coefficients:
Estimate Std. Error t value Pr(>|t|)
e 0.78045 0.29774 2.621 0.0185 *
s 1.00922 0.06230 16.198 2.40e-11 ***
a 0.38364 0.02522 15.212 6.19e-11 ***
b 0.48524 0.02352 20.627 5.94e-13 ***
# Results: E=0.78045, S=1.00922, A=0.38364, B=0.48524

Allowing E to float during the regression resulted in a value for E of E = 0.78045. This is not close to either value for E determined via fragments or by using the predicted value from ChemSpider. We therefore decided to take the value from ChemSpider as the value for E, as we do for all our webservices, giving the following values for the other Abraham descriptors: S = 1.07324, A = 0.37221, B = 0.52214.

Results

Comparing the predicted solubility (using E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171) with the measured values, we find an AAE of 0.09 log units (0.30 M). Clear outliers are chloroform, nitrobenzene and DMSO. Without these outliers the AAE falls to 0.06 log units (0.10 M). The large discrepancies in the outliers may be due more to trans-cinnamic acid lying outside the chemical space for which the Abraham solvent parameters were determined rather than any error in measurement (ResultsE1.303.xlsx).

## Determination of the Abraham model solute descriptors for trans-cinnamic acid from measured solubilities

Professors: Michael H. Abrahm, William E. Acree Jr., Jean-Claude Bradley, Andrew S.I.D. Lang## Abstract

The Abraham model solute descriptors can be used in many applications including the prediction of solubility in the increasing number of solvents that have Abraham parameters. This has obvious benefits to solvent selection processes such as reaction design or recrystallization. Using solubility measurements from the literature and those collected as part of the Open Notebook Solubility Challenge, we have calculated the descriptors for trans-cinnamic acid to be:E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171## Introduction

The solubility (base 10 logarithm of the molar concentration) of a compound in various organic solvents can be approximated using Abraham's general solvation equationlogC_s = logC_w + c + e * E + s * S + a * A + b * B + v * V

where logC_w is the base 10 logarithm of the aqueous solubility; E, S, A, B, and V are the solute specific Abraham descriptors; c, e, s, a, b, and v are the solvent specific Abraham parameters.

## Calculating the descriptors

Calculating VThe solute descriptor V, the McGowan characteristic volume, can be calculated directly from structure. It is equal to the McGowan characteristic volume (cubic cm per mol)/100. [1] For trans-cinnamic acid

V = 1.171Calculating EThe solute descriptor E, the excess molar refractivity, is defined to be the molar refraction, MR, of the compound calculated using the McGowan’s volume less the molar refraction of an alkane with the same McGowan volume. That is E = (MR) − 2.83195 V + 0.52553, where MR = 10 (η^2 −1)/(η^2 + 2) V, where η is the refractive index. For solutes that are solid at room temperature, E can be calculated in three main ways:

1. From the ACD Labs predicted molar refractivity available from ChemSpider. Since Abraham descriptors are calculated using the McGowan volume, the ACD Labs value must be converted before use using V (intrinsic) = 0.597 + 0.6823 V. [1] For trans-cinnamic acid

E = 1.3032. From structure by comparing the solute fragment-wise with compounds with known values for E. For the case of trans-cinnamic acid, comparison can be made with ethyl benzoate (E = 0.689), ethyl cinnamate (E = 1.102), and benzoic acid (E = 0.730). This gives

E = 1.143for cinnamic acid.3. By letting E float. That is, determine E from regression, see 'calculating S, A, B' below.

Calculating S, A, and BS (dipolarity/polarizability), A (hydrogen bond acidity), and B (hydrogen bond basicity) can be calculated using GLC [2], RP-HPLC [3], or simple linear regression from known solubility data or known logP values. [4] We collected solubility values and logP values from the literature and from Bio-Loom (trial version). All values (mole fraction, mass fraction, mass ratio, and logP) were converted to molarity for ease of comparison. None of the bio-loom logP values were not included in the final regression as they were determined to be in many cases incorrect by several orders of magnitude.

Additional solubility data was calculated in two ways, either using the density method or by using NMR as part of the ONSChallenge (live data: solubility of cinnamic acid in various organic solvents).

The measured solubilities were combined with the others collected from the literature resulting in 74 solute/solvent values (molar concentrations) at temperatures ranging from 19.5C to 28C. The solubility values were all converted to values at 25C using the Buchowski equation with the assumption of miscibility at solute melting point. Multiple measurements for the same solvent were averaged giving a total of 33 solute/solvent values, see the table below (taken 20120209).

Linear regression was performed using R 2.13.0 on the three files E = 1.303, E = 1.143, E = FLOAT with the following code:

Allowing E to float during the regression resulted in a value for E of E = 0.78045. This is not close to either value for E determined via fragments or by using the predicted value from ChemSpider. We therefore decided to take the value from ChemSpider as the value for E, as we do for all our webservices, giving the following values for the other Abraham descriptors: S = 1.07324, A = 0.37221, B = 0.52214.

## Results

Comparing the predicted solubility (using E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171) with the measured values, we find an AAE of 0.09 log units (0.30 M). Clear outliers are chloroform, nitrobenzene and DMSO. Without these outliers the AAE falls to 0.06 log units (0.10 M). The large discrepancies in the outliers may be due more to trans-cinnamic acid lying outside the chemical space for which the Abraham solvent parameters were determined rather than any error in measurement (ResultsE1.303.xlsx).## Conclusion

The Abraham general solvation model has been used to determine the Abraham descriptors for trans-cinnamic acid from measured solubility values. These descriptors can be used in turn to predict the solubility of trans-cinnamic acid in the ever increasing list of solvents with known Abraham parameters.## References

[1] Abraham MH and McGowan JC. 1987. The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatograph. Chromatographia, Volume 23, Number 4. pp. 243-246. DOI: 10.1007/BF02311772[2] Abraham MH, Ibrahim A, Zissimos AM. 2004. The determination of sets of solute descriptors from chromatographic measurements. J Chromatogr A 1037:29–47.

[3] Andreas MZ, Abraham MH,

et al.2002. Calculation of Abraham descriptors from experimental data from seven HPLC systems; evaluation of five different methods of calculation. J. Chem. Soc. Perkin Transactions 2. pp. 2001-2010. ISSN 1472-779X[4] Abraham MH,

et al.2009. Prediction of Solubility of Drugs and Other Compounds in Organic Solvents. Journal of Pharmaceutical Sciences. DOI: 10.1002/jps.21922