Batch Melting Point Prediction Using MeltingPointModel002


Introduction

We present here instructions on how to use our melting point prediction model (MeltingPointModel002) to predict the melting points of organic compounds either individually or in batches on your local machine. We assume no previous experience using R or the CDK.

What You'll Need

Setting Up The Model

1. If you don't already have R on your machine, download and install it.
2. Create a working directory where you'll put all your work, something like C:\working\directory
3. Download our RF model to your working directory.
4. Download the CDK Descriptor Calculator to your working directory.
5. Download the Random Forest Package for R to your working directory.
6. Start R and install the random forest package by using the menu: Packages -> Install package(s) from local zip files...
You can also install packages directly from a CRAN mirror, see this How to install and update packages in R video.
You computer is now set up to use our model.

Using the Model

Calculate CDK descriptors for you compounds

1. Create a text (.txt) file in your working directory and enter the SMILES for your compounds, one on each line. Save your file as smiles.txt. SMILES for compounds can easily be found on ChemSpider.
2. Create a blank text file in your working directory (descriptors.txt)
3. Start the CDK Descriptor Calculator (CDKDescUI.jar)
4. Use the browse buttons to select smiles.txt as your Input File and descriptors.txt as your Output File.
5. Make sure all descriptors are selected, then uncheck the following descriptors: Charged Partial Surface Area (under electronic), Ionization Potential (under electronic), Amino Acid Count (under protein), all geometrical descriptors, and WHIM (under hybrid). They are not used in the model.
6. Use the menu to select the correct output format (tab separated): Options -> Output Method -> Tab delimeted
7. You're ready, click the Go button.

Calculate the melting point predictions

Start R and in the console enter the following commands one at a time:
## load the random forest package into R
library(randomForest)
## set the working directory
setwd("C:/working/directory")
## load in the model
mydata.rf <- .readRDS(file="rfmodel.gz")
## load in the data
mytestdata = read.table(file="descriptors.txt",as.is = TRUE, header = TRUE, sep = "\t",row.names="Title")
## run the data through the random forest model
test.predict <- predict(mydata.rf,mytestdata)
## write the results to file
write.csv(test.predict, file = "RFPredict.csv")
Once the code has finished executing, you can find your melting point predictions in the RFPredict.csv file in your working directory.

References

[1] A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.