Dataset for "Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach"

Machine learning (ML) has previously been used to predict density functional theory (DFT) free energy reaction barriers on a variety of different reactions from semi-empirical quantum mechanical (SQM) inputs. These models can require expensive dataset curation and can struggle with generalisability outside of the datasets immediate chemical space. One such approach that can drastically lower the number of required training points is transfer learning (TL). We demonstrate that various TL techniques can be used to provide highly accurate results with a fraction of the training points required for standard ML, thus lowering the overall computational cost of barrier predictions. This dataset includes all the structural data in the form of Gaussian16 (Revision A.03 and C.01) output files for the Diels-Alder and [3+2] cycloaddition reactions used for this ML/TL analysis. This data archive also includes exemplar code for performing some standard ML from the manuscript.

Keywords:
Machine Learning, Transfer Learning, Computational Chemistry, Diels-Alder, Gaussian
Subjects:
Chemical reaction dynamics and mechanisms

Cite this dataset as:
Espley, S., Farrar, E., 2023. Dataset for "Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach". Bath: University of Bath Research Data Archive. Available from: https://doi.org/10.15125/BATH-01229.

Export

[QR code for this page]

Data

data_archive_3_2.zip
application/zip (175MB)
Creative Commons: Attribution 4.0

Zip folder containing optimised ground state reactant (1,3-dipoles and dipolarophiles) and transition state structures for [3+2] cycloaddition reactions. These calculations were performed at two levels of theory (AM1 and wB97X-D/def2-TZVP).

data_archive.zip
application/zip (2GB)
Creative Commons: Attribution 4.0

Zip folder containing optimised ground state reactant (dienes and dienophiles) and transition state structures for Diels-Alder reactions. These calculations were performed at multiple levels of theory (AM1, PM3, wB97X-D/def2-TZVP, and DSD-PBEP86-D3(BJ)/def2-TZVP) and are grouped into different enumerations (enum1, enum2, enum3, enum4, and enum5). Included is a dictionaries folder that contains csv files to pair each transition state with its respective diene and dienophile.

Code

example_code.zip
application/zip (571kB)
Creative Commons: Attribution 4.0

Zip folder containing exemplar code to perform the machine learning in the paper. The folder contains the example.ipynb, endo_dataset.pkl (the extracted data in .pkl file format), environment.yaml (the .yaml file to generate the environment) and, a README.md file.

Creators

Sam Espley
University of Bath

Elliot Farrar
University of Bath

Contributors

Matthew Grayson
Supervisor
University of Bath

Simone Tomasi
Supervisor
AstraZeneca

David Buttar
Supervisor
AstraZeneca

University of Bath
Rights Holder

AstraZeneca
Rights Holder

Documentation

Data collection method:

Ground state reactant and transition state geometries for Diels-Alder reactions were built using Schrödinger’s R-Group Enumeration. R-groups were placed on various different positions of both dienes and dienophiles; the position depended upon the molecules in question. All structures were built in Gaussian16 (Revisions A.03 and C.01) and were conformationally searched using Schrödinger’s MacroModel (version 12.7). All structures were subsequently optimised using Gaussian16 (Revisions A.03 and C.01) using three different molecular modelling methods (AM1, PM3, and wB97X-D/def2-TZVP). A subset of the reactions were also optimised with DSD-PBEP86-D3(BJ)/def2-TZVP. The same process was used for the [3+2] reactions however, these reactions were only optimised at the AM1 and wB97X-D/def2-TZVP levels of theory.

Funders

Engineering and Physical Sciences Research Council (EPSRC)
https://doi.org/10.13039/501100000266

Grant
EP/V519637/1

Engineering and Physical Sciences Research Council (EPSRC)
https://doi.org/10.13039/501100000266

Grant
EP/R513155/1

Engineering and Physical Sciences Research Council (EPSRC)
https://doi.org/10.13039/501100000266

Machine Learning and Molecular Modelling: A Synergistic Approach to Rapid Reactivity Prediction
EP/W003724/1

Publication details

Publication date: 31 May 2023
by: University of Bath

Version: 1

DOI: https://doi.org/10.15125/BATH-01229

URL for this record: https://researchdata.bath.ac.uk/id/eprint/1229

Related papers and books

Espley, S. G., Farrar, E. H. E., Buttar, D., Tomasi, S., and Grayson, M. N., 2023. Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach. Digital Discovery, 2(4), 941-951. Available from: https://doi.org/10.1039/d3dd00085k.

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Sam Espley

Departments:

Faculty of Science
Chemistry