Open+Melting+Point+Datasets

=Open Melting Point Datasets CC0= This page will serve as a means of resolving the exact composition of melting point datasets as these are curated, filtered or combined. Web services, live data feeds and modeling available from the Melting Point page on the ONSwebservices wiki.

Curators
Jean-Claude Bradley, Andrew Lang and Antony Williams.
 * ONSMP000**: ([[file:onschallenge/ONSMP000_AlfaAesar_MP.xlsx|ONSCwiki]]) 15591 full raw entries from Alfa Aesar containing duplicates and non numerical values
 * ONSMP001**: ([|ORU]) 12986 measurements as simple numeric values converted from mp ranges and other entries with non-numeric characters from Alfa Aesar (ONSMP000).
 * ONSMP002**: ([|ORU]) 8739 measurements derived from ONSMP001 with redundancies, salts, inorganics and organometallics removed. Silicon, phosphorus and boron containing organic compounds were retained. SMILES, CSIDs and links to the Alfa Aesar catalog are included.
 * ONSMP003**: ([|ORU]) 4450 measurements from [|Karthikeyan 2005]. Includes SMILES and many descriptors.
 * ONSMP004**: ([|ORU]) 4084 measurements derived from ONSMP003 - includes compound names and CSIDs - excludes SMILES that did not properly render with OpenEye. 48 compounds were missing from ONSMP004 that were in ONSMP003 - these have been recovered but they do not have associated names or CSIDs: [[file:onschallenge/missing48.xlsx|ONSMP004a]]
 * ONSMP005**: ([|ORU]) 277 measurements from [|Bergstrom 2003]. Drug molecules separated as training and validation sheets. SMILES provided.
 * ONSMP006** ([|ORU]) 277 measurements derived from ONSMP005 compiled into one sheet and both SMILES and CSIDs provided.
 * ONSMP007** (ORU) 3910 highly curated melting points from Karthikeyan dataset ONS004 further curated by removal of all duplicate entries (with very different melting points)
 * ONSMP008** ([[file:onschallenge/ONSMP008KarthikeyanDuplicates1.xlsx|ONSCwiki]]) 33 Duplicates (66 measurements) with a difference with more than 10C from the Karthikeyan dataset ONSMP003
 * ONSMP009** ([[file:onschallenge/ONSMP009badSMILES1.txt|ONSCwiki]]) 311 SMILES which could not be rendered correctly on ChemSketch from Karthikeyan dataset ONSMP003
 * ONSMP010** ([[file:onschallenge/ONSMP010EPISuite1.xlsx|ONSCwiki]]) 150 SMILES consisting of all EPI melting point data (via ChemSpider) from a 2011-03-04 snapshot of Cheminfo Validation sheet. 106 of these have at least one MP from another source. 10 of the 106 show a difference of at least 5C between the EPI and the other sources.
 * ONSMP011** ([[file:onschallenge/20110220cheminfo.xlsx|ONSCwiki]]) 335 measurements. A snapshot taken 2011-02-20 of the crowdsourced melting point data in the ChemInfo Validation Sheet.
 * ONSMP012** ([[file:onschallenge/20110303RemovedData.xls|ONSCwiki]]) 1286 measurements removed from the union of ONSMP002, ONSMP003, ONSMP006, and ONSMP011. Data were removed because they were either salts, had a large discrepancy in measurements (greater then 10C), were suspected erroneous measurements, were unneeded duplicates, or failed to produce CDK descriptors, see meltingpointmodel001.
 * ONSMP013** ([|ORU]) 12634 highly curated (see ONSMP012 above) unique melting point measurements with common name, SMILES, CSID, and CDK descriptor values based upon the union of ONSMP002, ONSMP003, ONSMP006, and ONSMP011.
 * ONSMP014** ([[file:onschallenge/ONSMP014DrugbankRaw.xlsx|ONSwiki]]) 1070 melting points (not simple numeric format) from [|DrugBank] - includes SMILES, InChI, LogP and aqueous solubility when available
 * ONSMP015** ([[file:onschallenge/ONSMP015.xlsx|ONSCwiki]]) 875 melting points resulting from a curation of ONSMP014 - added CDK descriptors and EPI Suite measured and predicted melting point values.
 * ONSMP016** ([[file:onschallenge/ONSMP016.xlsx|ONSCwiki]]) 313 compounds from ONSMP015 with either no EPI Suite measured value or where the difference between the EPI Suite measured value and the original Drug Bank measured value exceeded 10 °C.
 * ONSMP017** ([[file:onschallenge/ONSMP017.xlsx|ONSCwiki]]) 562 compounds from ONSMP015 where the difference between the EPI Suite measured value and the original Drug Bank measured value does not exceed 10 °C - includes melting point predictions from @MeltingPointModel002.
 * ONSMP018** ([[file:onschallenge/BellOpenMeltingPointData_With CSIDS.csv|ONSCwiki]]) 1631 curated melting points with CSIDs original compiled by Prof. H. M. Bell, Dept of Chemistry, Virginia Tech, Blacksburg, VA 24061. Released as Open Data.
 * ONSMP019** ([[file:onschallenge/OxfordMSDSOriginal.csv|ONSCwiki]]) 3217 full raw entries from the Oxford MSDS sheets containing ranges, salts, metal, etc.
 * ONSMP020** ([[file:onschallenge/OxfordMSDSCurated.csv|ONSCwiki]]) 1481 curated (removed: salts, metals, decomposes, sublimes, elements, ">", "<", "a.") melting points with CSIDs from the Oxford MSDS sheets, see ONSMP019.
 * ONSMP021** ([[file:onschallenge/HughesOriginal.xlsx|ONSCwiki]]) 287 melting points taken from the supplementary material of a 2008 paper by Hughes et al.
 * ONSMP022** ([[file:onschallenge/HughesCurated.csv|ONSCwiki]]) 262 curated melting points from ONSMP021 with SMILES and CSIDs.
 * ONSMP023** ([[file:onschallenge/PhysPropMPOriginal.xlsx|ONSCwiki]]) 11,645 compounds with raw melting points from the April 2011 PHYSPROP database of 43,544 compounds, released as Open Data.
 * ONSMP024** ([[file:onschallenge/PhysPropMPCurated.xlsx|ONSCwiki]]) 9,693 curated compounds with CSIDs and with ranges converted to midpoint simple numeric values from ONSMP023 with all elements, metals, salts, decomposes, >, <, ~ removed.
 * ONSMP025** ([[file:onschallenge/ONSMP025.xlsx|ONSCwiki]]) 19876 unique compounds with averaged melting points and melting point ranges combined from available sources on June 1, 2011.
 * ONSMP026** ([[file:onschallenge/GriffithsOriginal.csv|ONSCwiki]]) 3757 melting points (original file - not simple numeric format) extracted from scripts crystal data files by Will Griffiths.
 * ONSMP027** ([[file:onschallenge/GriffithsCurated.xlsx|ONSCwiki]]) 278 compounds from ONSMP026 with salts, dec, sublimes removed. Compounds without units were also removed as the data without units are a mixture of C and K.
 * ONSMP028** ([[file:onschallenge/2011-07-27mpdata.xlsx|ONSCwiki]]) 20152 unique compounds (27906 total measurements) with averaged melting points and melting point ranges combined from available sources on July 27, 2011.
 * ONSMP029** ([[file:onschallenge/20110727doublevalidated.xlsx|ONSCwiki]]) 2706 highly curated double+ validated (range: 0.1-5 C) unique compounds (7413 total measurements from ONSMP029sources([[file:onschallenge/ONSMP029sources.xlsx|ONSCwiki]])) taken from ONSMP028 with compounds that had at least one chiral center, possessed cis/trans isomerism, were inorganic or a salt removed.
 * ONSMP030** ([[file:onschallenge/20110803ONSMP030.xlsx|ONSCwiki]]) 19933 unique compounds from ONSMP028 with salts and metals removed.
 * ONSMP031** ([[file:onschallenge/ONSMP031.xlsx|ONSCwiki]]) 19515 unique compounds from ONSMP030 with additional compounds removed before modeling, see: ONSMPModel007
 * 1) **ONSMP031TrainingSet** ([|showme]) 16015 unique compounds from ONSMP031 used as the training set for ONSMPModel007
 * 2) **ONSMP031TestSet** ([|showme]) 3500 unique compounds from ONSMP031 used as the test set for ONSMPModel007
 * ONSMP032** ([[file:onschallenge/ONSMP032.xlsx|ONSCwiki]]) 27792 melting point values collected from the live Open Melting Point Data dataset 20120516.
 * ONSMP033** (showme) 19410 unique compounds with melting points and CDK descriptors from ONSMP032 used in ONSMPModel009
 * ONSMP034** (figshare) 20482 unique compounds with 28231 melting point values. 2014-03-03
 * Bradley ONSMP** (figshare) 28645 melting point values. 2014-05-20