Modeling

Existing models of non-aqueous solubility

Abolghasem Jouybana and William E. Acree Jr. Solubility prediction in non-aqueous binary solvents using a combination of Jouyban-Acree and Abraham models Fluid Phase Equilibria
Volume 249, Issues 1-2, 15 November 2006, Pages 24-32 (
http://dx.doi.org/10.1016/j.fluid.2006.08.016 )

While aqueous solubility has been extensively studied using a variety of experimental and computational methods, there are far fewer such studies that consider non-aqueous solvents. For example Kundu et al reported an approach to predict the solubility of CO2 in aqueous alkanolamines [Kundu2008]. The model used literature data to evaluate activity coefficients based on the modified Clegg-Pitzer equations and then employed a genetic algorithm to perform parameter optimization.

Jouyban et al [Jouyban2002] described an extension to the Williams and Amidon [Williams1984] to predict solubility in binary solvents. While the main focus was the solubility of drugs under physiological conditions, they also consider non-aqueous solvent mixtures, but only considered a single solute (oxolinic acid).

Hilal et al [Hilal2004] described an approach, termed SPARC, which attempts to model physical properties and chemical reactivity parameters using a set of "core models". These core models were designed to address both inter- and intra-molecular interactions, via sums of free energy changes, along with empirically derived values for properties such as polarizability and H-bonding. For solid solutes, the approach explicitly takes into crystal energies into account, though these are also derived empirically. Their results suggest that the method performs relatively well when tested on a large collection of solutes in a variety of polar and non-polar solvents.

The approaches described above are primarily empirical and require experimentally derived parameters. An alternative approach is to consider ab initio methods, which do not require empirically derived parameters. Unsurprisingly, such approaches can be significantly time consuming especially when one must evaluate accurate crystal lattice energies for solid solutes. However, the approaches based on the COSMO formulation [Klamt1995] such as COSMO-RS [Klamt2000] such as the use of sigma-profiles [Mullins2008] do allow for relatively high speed computation of solubilities in a variety of solvents.

Write about Totchasov2007 ALS model
Write about Kan UNIFAC model

Visualization and Querying

The SolubilitySum Spreadsheet

Web Query

Web-Based Viewer

Chemical Spaces

Second Life

Second Life is a freely available multi-user virtual enviroment that we have used as a platform for real-time collaboration and multi-dimensional data visualization. We have used Second Life to add another spacial dimension to our descriptor chemical space visualizations. The descriptor chemical space is now effectively a 5-dimensional visualization of the solubility data; the three spatial dimensions corresponding to CDK descriptors and the point-size and color representing solubility and functionality respectively, see Figure AL1.
SLDescriptorSpace.png
SLDescriptorSpace.png

Figure AL1: A 5-dimensional descriptor chemical space in Second Life.
[I'll take a new pic just before we send it in - new data added continually it seems - AL]

Visualizing the solubility data as a 5-dimensional descriptor chemical space in Second Life has several advantages over the web-interface. Firstly, Second Life is a naturally collaborative and immersive environment, we can log in together and interact with the data collaboratively - discussing the solubility data in real-time as we each view the data from different angles. Second Life also gives you a sense of immersion, allowing for interactions with the data not possible outside of virtual reality. For example, you can walk into the visualization and look at the data from the inside out. Secondly, adding another CDK descriptor as a third spatial dimension allows for better exploration of the data over the web viewer and not just for the obvious ability of being able to look for patterns, trends, and correlations in higher dimensional spaces. As with the web browser version of the descriptor chemical spaces, the Second life viewer allows the user to control the point-size, view the data for each data point via a tool-tip by hovering the cursor over each data point, and view the detailed solubility data for each solute by clicking on the data point which pops open a browser window with the relevant data. However, one of the limitations of the web browser descriptor chemical spaces is that sometimes points overlap or get hidden under other points. This can be controlled somewhat by making the points smaller but this doesn't eliminate the problem completely. By vieweing the data in Second Life, the problem of hidden or hard to see data points can be avoided completely.

As already discussed, one goal of the project is to discover the best solvents to use to produce Ugi products. To aid in this we have developed an interactive Ugi solubility explorer interface in Second Life, see figure AL2.
Solubility2.png
Solubility2.png

Figure AL2: The Ugi solubility explorer in Second Life.

The Ugi soulbility explorer allows the user to select combinations of solutes (an amine, an aldehyde, a carboxylic acid, and an isonitrile) from a board that displays the solutes from the Google Docs spreadsheet in real-time. When you select a solute, the Ugi solubility explorer queries the solubility data in the Google Docs spreadsheet and display the solubility values for ethanol, methanol, THF, and acetonitrile as a 3-dimensional bar chart. Users can click on each bar to pop open a webbrowser displaying the detailed solubility data for that particular data measurement. This visualization allows us to not only see the best solvent (or solvents) to use for a particular 4-combination of solutes but it also allows us to see where the data is sparse or missing, which in turn leads to a prioritizing of solubility measurement experiments.

Integration and Dissemination

RDF

ChemSpider - assuming sol data can be routinely integrated into CS

[Google? -AL]



[Totchasov2007] E. D. Totchasov, E. D., Nikiforov, M. Y. and Al’per, G. A. "Calculation of solubility of solid solutes in mixed non-aqueous solvents within the limits of the theory of molecular association", J. Struct. Chem., 2007, 48(3), 474-478. (doi: 10.1007/s10947-007-0071-3 )

[Kan1996] Kan, A. T. and Tomson, M. B. "UNIFAC Prediction of Aqueous and Nonaqueous Solubilities of Chemicals with Environmental Interest", Environ. Sci. Technol., 1996, 30 (4), pp 1369–1376. (doi: 10.1021/es950638o )

Description of the models

Assessment of the applicability of models for general solubility prediction


Engineering the Ugi reaction

using solubility control

The 4-pyrenebutanoic acid and phenanthrene-9-carboxaldehyde problem

For some Ugi reactions, reagents that are by themselves poorly soluble can be brought into solution in the presence of the other reactants. Although it is possible to manually prepare such products, it precludes the use of automation which relies on liquid handling, where stock solutions of 2 M in methanol are optimal. However, the solubilities of 4-pyrenebutanoic acid and phenanthrene-9-carboxaldehyde in methanol is less than 0.1 M.

175C.jpg
175C.jpg


Exploring the solubility space of phenanthrene-9-carboxaldehyde reveals that XXX and XXX are far superior solvents.
Exploring the solubility space of 4-pyrenebutanoic acid reveals that XXX is a possible solvent for the Ugi reaction.

Conclusion

BD2.png
BD2.png

Figure AL3: ONS Challenge Workflow Summary
[If we decide to include this one, I will update it just before we send it in - AL]

Methods of Solubility Determination

Evaporation

The earliest measurements for the ONS Challenge were carried out using a SpeedVac. The basic protocol involves making a saturated solution then removing a measured volume of supernatant and evaporating to obtain the mass of solute remaining. In principle this is a straightforward measurement but we encountered some difficulties. The method assumes that the solute is not volatile, the solvent gets completely removed and that the volume measurement is accurate. The first of these assumptions led to some erroneous initial results for 4-chlorobenzaldehyde.[evapref] Although in that instance the vacuum was measured, in many laboratories that are set up for the evaporation of water from biochemical materials no pressure gauge is available. The volume measurement of organic solvents in micropipettes designed to deliver aqueous solutions represents an additional uncertainty.

UV-vis spectroscopy

The use of UV-visual wavelength spectroscopy was briefly explored. It does not suffer from the disadvantage of requiring non volatile solutes. However, a major limitation is the necessity of preparing calibration runs for every solvent solute combination thus greatly slowing down the process and introducing more opportunities for student error. In addition not all solutes bear the requisite chromophores.

Sequential Precipitation

One of the assumptions made in all the above techniques is that truly saturated solutions were prepared. At the start of this project vortexing was used to prepare the solutions. It became apparent over time that some of the values determined in this way were underestimated because of insufficient mixing. Over time a new criterion was developed requiring the presence of solid during a minimum of 30 minutes sonication. This treatment also heated the solution and when additional solid was observed precipitating upon reaching room temperature saturation could be established.

It is sometimes difficult to tell if more solid is accumulating during cooling. However it is very clear when solid appears from a clear solution. Since it is difficult to predict how much solid relative to solvent will produce a clear solution a variation on this approach is to create a series of known amounts of materials in screw capped vials. All the samples are heated to an elevated temperature still well below the boiling point of the solvent. After clear solutions are obtained the samples are slowly cooled and the temperatures at which each sample comes out are recorded. The concentration at room temperature (23C) can be intrapolated from the nearest points above and below. The use of a thermostated bath facilitates the execution of the experiment.

After the experiment, one of the concentrations can be verified by NMR using the solvent as internal standard, which also ensures that the solution contains only pure solute and solvent. The supernatant from one of the concentrated samples can also be analyzed by NMR to confirm the room temperature intrapolation. This sequential precipitation technique then affords not only two ways to measure the room temperature solubility but also allows the collection of temperature solubility curves. We have recently extended the exploration to construct multi-dimensional solubility spaces of solvent mixtures. These measurements are stored in the SolSumMix Google Spreadsheet.

[Figure 4 3D sol space measured and regressed]

1. Outlier Bot

Specification of thresholds for the ratio of the standard deviation to mean and the Grubbs outlier term returns a list of measurements in each solvent that need further investigation. This is a very convenient way to catch errors of any type. The user can then investigate the laboratory notebook pages and all associated raw data to try to uncover the measurements that are clearly in error. Common reasons for flagging measurements as "DONOTUSE" include insufficient mixing, missing details in logs, likely partial evaporation of the saturated solution before taking the NMR, volatility of solutes in evaporation experiments and insufficient relaxation time allocation for solutes bearing only aromatic hydrogens during NMR acquisition. In cases where it is unclear why there is a discrepancy in the measurements a request is made to repeat the measurement in the DoSol Spreadsheet. In general values less than 0.1 M are not repeated because of the lower precision of measurements at very low concentrations for NMR techniques. Use of this bot recently resulted in further examination of some results that were difficult to reproduce from the literature for the solubility of 4-nitrobenzaldehyde in methanol and revealed that solutes of this type react with alcoholic solvents.

2. The solubility request form

In the spirit of crowdsourcing measurements, the ONS Challenge encourages the submission of requests for solubilities. Requests can be added directly to the DoSol spreadsheet or more conveiniently through a web based form. A recent example involves a request for the solubility of pyrene in acetonitrile. The requester was interested in extracting pyrene from soil in an environmental study in Israel. Such interactions could potentially allow collaborations to develop that would otherwise not happen. The risk of overly diluting the research efforts of participating groups is minimal since ultimately the project manager controls the priority of all requests.

3. Keyword searches

Keeping track of searches on all web pages associated with the ONS Challenge is very helpful for understanding how people find data. Currently most queries originate either from general Google searches or from Wikipedia entries. Both a public Sitemeter and Google Analytics are used to keep track of traffic and searches. In most cases where a search consists of a particular solute and solvent the measurements are available. However sometimes a new combination of solute and solvent are requested (p-toluenesulfonic acid in ethanol and phthalic acid in chloroform are recent examples). If we consider these searches as implicit requests it makes sense to funnel these to the DoSol Spreadsheet. Although we don't know who the requesters are it is logical to infer that there was a legitimate need for the information. All other things being equal it is it more likely to benefit the chemistry community to perform these solubilities earlier than we otherwise would have.

4. Solubility Modeling
pending

5. Application based models
The practical application of the ONS Challenge is for chemists to use solubility data to select reaction and purification conditions (solvent, concentration and possibly temperature). For example, as will be described below, a model is used to predict whether or not a product with precipitate from a Ugi reaction. For cases where the model fails to predict the observed outcome, the solubility data for the Ugi product and starting materials are added to the DoSol Spreadsheet.

References

1) Bradley, J.-C. UsefulChem blog September 28, 2008 (http://usefulchem.blogspot.com/2008/09/open-notebook-science-challenge.html)
2) Bradley, J.-C. UsefulChem blog November 3, 2008 (http://usefulchem.blogspot.com/2008/11/submeta-open-notebook-science-awards.html)
3) Bradley, Jean-Claude; Mirza, Khalid; Owens, Kevin; Osborne, Tom and Williams, Antony (November 2008). "Optimization of the Ugi reaction using parallel synthesis and automated liquid handling". Journal of Visualized Experiments. doi:10.3791/942 )

[Hilal2004] Hilal, S. H., Karickhoff, S. W., and Carreira, L. A.; "Prediction of the Solubility, Activity Coefficient and Liquid/Liquid Partition Coefficient of Organic Compounds", QSAR Comb. Sci., 2004, 23, 709--720

[Jouyban2002] Jouyban, A., Romero, S., Chan, H. K., Clark, B. J., and Bustamante, P.; "A Cosolvency Model to Predict Solubility of Drugs at Several Temperatures from a Limited Number of Solubility Measurements", Chemical \& Pharmaceutical Bulletin, 2002, 50, 594--599

[Klamt1995] Klamt, A.; "Conductor-like Screening Model for Real Solvents - a New Approach to the Quantitative Calculation of Solvation Phenomena", J. Phys. Chem., 1995, 99, 2224--2235

[Klamt2000] Klamt, A. and Eckert, F.; "COSMO-RS: A Novel and Efficient Method for the a Priori Prediction of Thermophysical Data of Liquids", Fluid Phase Equilibria, 2000, 172, 43--72

[Kundu2008] Kundu, M., Chitturi, A., and Bandyopadhyay, S. S.; "Prediction of Equilibrium Solubility of CO2 in Aqueous Alkanolamines Using Differential Evolution Algorithm", Can. J. Chem. Eng., 2008, 86, 117--126

[Mullins2008] Mullins, E., Liu, Y. A., Ghaderi, A., and Fast, S. D.; "Sigma Profile Database for Predicting Solid Solubility in Pure and Mixed Solvent Mixtures for Organic Pharmacological Compounds with COSMO-Based Thermodynamic Methods", Ind. Eng. Chem. Res., 2008, 47, 1707--1725

[Williams1984] Williams, N. A. and Amidon, G. L.; "Excess Free-Energy Approach to the Estimation of Solubility in Mixed-Solvent Systems .1. Theory", J. Pharm. Sci., 1984, 73, 9--13

[Hilal2004] Hilal, S. H., Karickhoff, S. W., and Carreira, L. A.; "Prediction of the Solubility, Activity Coefficient and Liquid/Liquid Partition Coefficient of Organic Compounds", QSAR Comb. Sci., 2004, 23, 709--720

[Jouyban2002] Jouyban, A., Romero, S., Chan, H. K., Clark, B. J., and Bustamante, P.; "A Cosolvency Model to Predict Solubility of Drugs at Several Temperatures from a Limited Number of Solubility Measurements", Chemical \& Pharmaceutical Bulletin, 2002, 50, 594--599

[Klamt1995] Klamt, A.; "Conductor-like Screening Model for Real Solvents - a New Approach to the Quantitative Calculation of Solvation Phenomena", J. Phys. Chem., 1995, 99, 2224--2235

[Klamt2000] Klamt, A. and Eckert, F.; "COSMO-RS: A Novel and Efficient Method for the a Priori Prediction of Thermophysical Data of Liquids", Fluid Phase Equilibria, 2000, 172, 43--72

[Kundu2008] Kundu, M., Chitturi, A., and Bandyopadhyay, S. S.; "Prediction of Equilibrium Solubility of CO2 in Aqueous Alkanolamines Using Differential Evolution Algorithm", Can. J. Chem. Eng., 2008, 86, 117--126

[Mullins2008] Mullins, E., Liu, Y. A., Ghaderi, A., and Fast, S. D.; "Sigma Profile Database for Predicting Solid Solubility in Pure and Mixed Solvent Mixtures for Organic Pharmacological Compounds with COSMO-Based Thermodynamic Methods", Ind. Eng. Chem. Res., 2008, 47, 1707--1725

[Williams1984] Williams, N. A. and Amidon, G. L.; "Excess Free-Energy Approach to the Estimation of Solubility in Mixed-Solvent Systems .1. Theory", J. Pharm. Sci., 1984, 73, 9--13