Batch+Prediction

=Batch Melting Point Prediction Using MeltingPointModel002=

Introduction
We present here instructions on how to use our melting point prediction model (@MeltingPointModel002) to predict the melting points of organic compounds either individually or in batches on your local machine. We assume no previous experience using R or the CDK.

What You'll Need

 * R (a free statistical computer algebra system)
 * The Random Forest Package for R [1] (so you can use our random forest model)
 * The CDK Descriptor Calculator (Rajarshi Guha's GUI to the CDK needed to calculate the descriptors used in our model)
 * Our [|RF model for melting point prediction] (CC0)

Setting Up The Model
1. If you don't already have R on your machine, download and install it. 2. Create a working directory where you'll put all your work, something like C:\working\directory 3. Download our RF model to your working directory. 4. Download the CDK Descriptor Calculator to your working directory. 5. Download the Random Forest Package for R to your working directory. 6. Start R and install the random forest package by using the menu: Packages -> Install package(s) from local zip files... You can also install packages directly from a CRAN mirror, see this How to install and update packages in R video. You computer is now set up to use our model.

Calculate CDK descriptors for you compounds
1. Create a text (.txt) file in your working directory and enter the SMILES for your compounds, one on each line. Save your file as smiles.txt. SMILES for compounds can easily be found on ChemSpider. 2. Create a blank text file in your working directory (descriptors.txt) 3. Start the CDK Descriptor Calculator (CDKDescUI.jar) 4. Use the browse buttons to select smiles.txt as your Input File and descriptors.txt as your Output File. 5. Make sure all descriptors are selected, then uncheck the following descriptors: Charged Partial Surface Area (under electronic), Ionization Potential (under electronic), Amino Acid Count (under protein), all geometrical descriptors, and WHIM (under hybrid). They are not used in the model. 6. Use the menu to select the correct output format (tab separated): Options -> Output Method -> Tab delimeted 7. You're ready, click the Go button.

Calculate the melting point predictions
Start R and in the console enter the following commands one at a time: code library(randomForest) setwd("C:/working/directory") mydata.rf <- .readRDS(file="rfmodel.gz") mytestdata = read.table(file="descriptors.txt",as.is = TRUE, header = TRUE, sep = "\t",row.names="Title") test.predict <- predict(mydata.rf,mytestdata) write.csv(test.predict, file = "RFPredict.csv") code Once the code has finished executing, you can find your melting point predictions in the RFPredict.csv file in your working directory.
 * 1) load the random forest package into R
 * 1) set the working directory
 * 1) load in the model
 * 1) load in the data
 * 1) run the data through the random forest model
 * 1) write the results to file