cinnamic+acid

Determination of the Abraham model solute descriptors for trans-cinnamic acid from measured solubilities

 * Professors: Michael H. Abrahm, William E. Acree Jr., Jean-Claude Bradley, Andrew S.I.D. Lang**

Abstract
The Abraham model solute descriptors can be used in many applications including the prediction of solubility in the increasing number of solvents that have Abraham parameters. This has obvious benefits to solvent selection processes such as reaction design or recrystallization. Using solubility measurements from the literature and those collected as part of the Open Notebook Solubility Challenge, we have calculated the descriptors for trans-cinnamic acid to be: **E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171**

Introduction
The solubility (base 10 logarithm of the molar concentration) of a compound in various organic solvents can be approximated using Abraham's general solvation equation logC_s = logC_w + c + e * E + s * S + a * A + b * B + v * V where logC_w is the base 10 logarithm of the aqueous solubility; E, S, A, B, and V are the solute specific Abraham descriptors; c, e, s, a, b, and v are the solvent specific Abraham parameters.

Calculating the descriptors
The solute descriptor V, the McGowan characteristic volume, can be calculated directly from structure. It is equal to the McGowan characteristic volume (cubic cm per mol)/100. [1] For trans-cinnamic acid **V = 1.171** The solute descriptor E, the excess molar refractivity, is defined to be the molar refraction, MR, of the compound calculated using the McGowan’s volume less the molar refraction of an alkane with the same McGowan volume. That is E = (MR) − 2.83195 V + 0.52553, where MR = 10 (η^2 −1)/(η^2 + 2) V, where η is the refractive index. For solutes that are solid at room temperature, E can be calculated in three main ways: 1. From the ACD Labs predicted molar refractivity available from ChemSpider. Since Abraham descriptors are calculated using the McGowan volume, the ACD Labs value must be converted before use using V (intrinsic) = 0.597 + 0.6823 V. [1] For trans-cinnamic acid **E = 1.303** 2. From structure by comparing the solute fragment-wise with compounds with known values for E. For the case of trans-cinnamic acid, comparison can be made with ethyl benzoate (E = 0.689), ethyl cinnamate (E = 1.102), and benzoic acid (E = 0.730). This gives **E = 1.143** for cinnamic acid. 3. By letting E float. That is, determine E from regression, see 'calculating S, A, B' below. S (dipolarity/polarizability), A (hydrogen bond acidity), and B (hydrogen bond basicity) can be calculated using GLC [2], RP-HPLC [3], or simple linear regression from known solubility data or known logP values. [4] We collected solubility values and logP values from the literature and from Bio-Loom (trial version). All values (mole fraction, mass fraction, mass ratio, and logP) were converted to molarity for ease of comparison. None of the bio-loom logP values were not included in the final regression as they were determined to be in many cases incorrect by several orders of magnitude.
 * Calculating V**
 * Calculating E**
 * Calculating S, A, and B**

Additional solubility data was calculated in two ways, either using the density method or by using NMR as part of the ONSChallenge (live data: solubility of cinnamic acid in various organic solvents).

The measured solubilities were combined with the others collected from the literature resulting in 74 solute/solvent values (molar concentrations) at temperatures ranging from 19.5C to 28C. The solubility values were all converted to values at 25C using the Buchowski equation with the assumption of miscibility at solute melting point. Multiple measurements for the same solvent were averaged giving a total of 33 solute/solvent values, see the table below (taken 20120209). Six solvents were identified as solvents where the trans-cinnamic acid may dimerize: cyclohexane, hexane, carbon tetrachloride, m-xylene, toluene, benzene. These solvents were excluded from the regression. Pentachloroethane, tetrachloroethane, tetrachloroethylene, trichloroethylene, and propyl acetate were also excluded as they currently do not have Abraham solvent parameters. One last solvent, trifluoroethanol was excluded because a recent analysis has shown that its coefficients are in need of updating.
 * solvent || molar concentration ||
 * water || 0.004 ||
 * cyclohexane || 0.027 ||
 * hexane || 0.045 ||
 * carbon tetrachloride || 0.117 ||
 * trifluoroethanol || 0.168 ||
 * tetrachloroethylene || 0.172 ||
 * m-xylene || 0.201 ||
 * toluene || 0.253 ||
 * acetonitrile || 0.285 ||
 * benzene || 0.303 ||
 * chlorobenzene || 0.314 ||
 * 1,2-dichloroethane || 0.362 ||
 * pentachloroethane || 0.374 ||
 * trichloroethylene || 0.408 ||
 * nitrobenzene || 0.429 ||
 * 1-octanol || 0.537 ||
 * diethyl ether || 0.575 ||
 * propyl acetate || 0.609 ||
 * 2-butanol || 0.705 ||
 * 1-pentanol || 0.725 ||
 * tetrachloroethane || 0.746 ||
 * 1-butanol || 0.771 ||
 * ethyl acetate || 0.775 ||
 * 2-pentanol || 0.922 ||
 * chloroform || 0.961 ||
 * 2-propanol || 0.967 ||
 * 1-propanol || 0.986 ||
 * ethanol || 1.207 ||
 * acetone || 1.337 ||
 * methanol || 1.401 ||
 * cyclohexanone || 1.437 ||
 * THF || 2.367 ||
 * DMSO || 8.423 ||

Linear regression was performed using R 2.13.0 on the three files, , with the following code: code
 * 1) first row contains variable names, comma is separator
 * 2) assign the variable id to row names
 * 3) note the / instead of \ on mswindows systems

mydata <- read.table("RegressionE1.303.csv", header=TRUE, sep=",", row.names="solvent")

fit <- lm(Y ~ 0 + s + a + b, data=mydata)
 * 1) Multiple Linear Regression Example

summary(fit)
 * 1) show results


 * 1) Results: S=1.07324, A=0.37221, B=0.52214

mydata <- read.table("RegressionE1.143.csv", header=TRUE, sep=",", row.names="solvent")

fit <- lm(Y ~ 0 + s + a + b, data=mydata)
 * 1) Multiple Linear Regression Example

summary(fit)
 * 1) show results


 * 1) Results: S=1.05364, A=0.37571, B=0.51084

mydata <- read.table("RegressionEfloat.csv", header=TRUE, sep=",", row.names="solvent")

fit <- lm(Y ~ 0 + e + s + a + b, data=mydata)
 * 1) Multiple Linear Regression Example

summary(fit)
 * 1) show results

Residuals: Min      1Q   Median       3Q      Max -0.15769 -0.07673 -0.01604 0.05414  0.26691

Coefficients: Estimate Std. Error t value Pr(>|t|) e 0.78045    0.29774   2.621   0.0185 * s 1.00922    0.06230  16.198 2.40e-11 *** a 0.38364    0.02522  15.212 6.19e-11 *** b 0.48524    0.02352  20.627 5.94e-13 ***

code Allowing E to float during the regression resulted in a value for E of E = 0.78045. This is not close to either value for E determined via fragments or by using the predicted value from ChemSpider. We therefore decided to take the value from ChemSpider as the value for E, as we do for all our webservices, giving the following values for the other Abraham descriptors: S = 1.07324, A = 0.37221, B = 0.52214.
 * 1) Results: E=0.78045, S=1.00922, A=0.38364, B=0.48524

Results
Comparing the predicted solubility (using E: 1.303 S: 1.073 A: 0.372 B: 0.522 V: 1.171) with the measured values, we find an AAE of 0.09 log units (0.30 M). Clear outliers are chloroform, nitrobenzene and DMSO. Without these outliers the AAE falls to 0.06 log units (0.10 M). The large discrepancies in the outliers may be due more to trans-cinnamic acid lying outside the chemical space for which the Abraham solvent parameters were determined rather than any error in measurement.
 * solvent || LogC_s || Pred LogC_s || AE || C_s || Pred C_s || AE ||
 * 1 2-dichloroethane || -0.441 || -0.363 || 0.079 || 0.362 || 0.434 || 0.072 ||
 * 1-butanol || -0.113 || -0.105 || 0.008 || 0.771 || 0.785 || 0.014 ||
 * 1-octanol || -0.270 || -0.195 || 0.075 || 0.537 || 0.638 || 0.101 ||
 * 1-pentanol || -0.140 || -0.059 || 0.081 || 0.725 || 0.874 || 0.149 ||
 * 1-propanol || -0.006 || -0.042 || 0.036 || 0.986 || 0.907 || 0.079 ||
 * 2-butanol || -0.152 || -0.139 || 0.013 || 0.705 || 0.727 || 0.022 ||
 * 2-pentanol || -0.035 || -0.077 || 0.042 || 0.922 || 0.837 || 0.085 ||
 * 2-propanol || -0.015 || -0.102 || 0.087 || 0.967 || 0.791 || 0.176 ||
 * acetone || 0.126 || 0.101 || 0.026 || 1.337 || 1.261 || 0.076 ||
 * acetonitrile || -0.545 || -0.470 || 0.075 || 0.285 || 0.339 || 0.054 ||
 * chlorobenzene || -0.503 || -0.630 || 0.127 || 0.314 || 0.234 || 0.080 ||
 * chloroform || -0.017 || -0.348 || 0.331 || 0.961 || 0.449 || 0.512 ||
 * cyclohexanone || 0.157 || 0.158 || 0.000 || 1.437 || 1.438 || 0.001 ||
 * diethyl ether || -0.240 || -0.173 || 0.067 || 0.575 || 0.671 || 0.096 ||
 * DMSO || 0.925 || 0.717 || 0.208 || 8.423 || 5.217 || 3.206 ||
 * ethanol || 0.082 || 0.088 || 0.006 || 1.207 || 1.224 || 0.017 ||
 * ethyl acetate || -0.111 || -0.028 || 0.082 || 0.775 || 0.937 || 0.162 ||
 * methanol || 0.146 || 0.060 || 0.086 || 1.401 || 1.149 || 0.252 ||
 * nitrobenzene || -0.368 || -0.069 || 0.299 || 0.429 || 0.853 || 0.424 ||
 * THF || 0.374 || 0.434 || 0.060 || 2.367 || 2.716 || 0.349 ||

Conclusion
The Abraham general solvation model has been used to determine the Abraham descriptors for trans-cinnamic acid from measured solubility values. These descriptors can be used in turn to predict the solubility of trans-cinnamic acid in the ever increasing list of solvents with known Abraham parameters.