Dataset for "Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach"
Machine learning (ML) has previously been used to predict density functional theory (DFT) free energy reaction barriers on a variety of different reactions from semi-empirical quantum mechanical (SQM) inputs. These models can require expensive dataset curation and can struggle with generalisability outside of the datasets immediate chemical space. One such approach that can drastically lower the number of required training points is transfer learning (TL). We demonstrate that various TL techniques can be used to provide highly accurate results with a fraction of the training points required for standard ML, thus lowering the overall computational cost of barrier predictions. This dataset includes all the structural data in the form of Gaussian16 (Revision A.03 and C.01) output files for the Diels-Alder and [3+2] cycloaddition reactions used for this ML/TL analysis. This data archive also includes exemplar code for performing some standard ML from the manuscript.
Cite this dataset as:
Espley, S.,
Farrar, E.,
2023.
Dataset for "Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach".
Bath: University of Bath Research Data Archive.
Available from: https://doi.org/10.15125/BATH-01229.
Export
Data
data_archive_3_2.zip
application/zip (175MB)
Creative Commons: Attribution 4.0
Zip folder containing optimised ground state reactant (1,3-dipoles and dipolarophiles) and transition state structures for [3+2] cycloaddition reactions. These calculations were performed at two levels of theory (AM1 and wB97X-D/def2-TZVP).
data_archive.zip
application/zip (2GB)
Creative Commons: Attribution 4.0
Zip folder containing optimised ground state reactant (dienes and dienophiles) and transition state structures for Diels-Alder reactions. These calculations were performed at multiple levels of theory (AM1, PM3, wB97X-D/def2-TZVP, and DSD-PBEP86-D3(BJ)/def2-TZVP) and are grouped into different enumerations (enum1, enum2, enum3, enum4, and enum5). Included is a dictionaries folder that contains csv files to pair each transition state with its respective diene and dienophile.
Code
example_code.zip
application/zip (571kB)
Creative Commons: Attribution 4.0
Zip folder containing exemplar code to perform the machine learning in the paper. The folder contains the example.ipynb, endo_dataset.pkl (the extracted data in .pkl file format), environment.yaml (the .yaml file to generate the environment) and, a README.md file.
Contributors
Matthew Grayson
Supervisor
University of Bath
Simone Tomasi
Supervisor
AstraZeneca
David Buttar
Supervisor
AstraZeneca
University of Bath
Rights Holder
AstraZeneca
Rights Holder
Documentation
Data collection method:
Ground state reactant and transition state geometries for Diels-Alder reactions were built using Schrödinger’s R-Group Enumeration. R-groups were placed on various different positions of both dienes and dienophiles; the position depended upon the molecules in question. All structures were built in Gaussian16 (Revisions A.03 and C.01) and were conformationally searched using Schrödinger’s MacroModel (version 12.7). All structures were subsequently optimised using Gaussian16 (Revisions A.03 and C.01) using three different molecular modelling methods (AM1, PM3, and wB97X-D/def2-TZVP). A subset of the reactions were also optimised with DSD-PBEP86-D3(BJ)/def2-TZVP. The same process was used for the [3+2] reactions however, these reactions were only optimised at the AM1 and wB97X-D/def2-TZVP levels of theory.
Funders
Engineering and Physical Sciences Research Council
https://doi.org/10.13039/501100000266
Industrial CASE Account – University of Bath 2020
EP/V519637/1
Engineering and Physical Sciences Research Council
https://doi.org/10.13039/501100000266
DTP 2018-19 University of Bath
EP/R513155/1
Engineering and Physical Sciences Research Council
https://doi.org/10.13039/501100000266
Machine Learning and Molecular Modelling: A Synergistic Approach to Rapid Reactivity Prediction
EP/W003724/1
Publication details
Publication date: 31 May 2023
by: University of Bath
Version: 1
DOI: https://doi.org/10.15125/BATH-01229
URL for this record: https://researchdata.bath.ac.uk/id/eprint/1229
Related papers and books
Espley, S. G., Farrar, E. H. E., Buttar, D., Tomasi, S., and Grayson, M. N., 2023. Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach. Digital Discovery, 2(4), 941-951. Available from: https://doi.org/10.1039/d3dd00085k.
Contact information
Please contact the Research Data Service in the first instance for all matters concerning this item.
Contact person: Sam Espley
Faculty of Science
Chemistry