Dataset for "Distortion/Interaction Analysis via Machine Learning"

Machine learning (ML) has previously been applied to predict reaction barriers for a variety of different chemical reactions. This is seen as the end point for this type of study however, post-reaction barrier analysis/energy decomposition approaches can provide insight into chemical reactivity. One such approach that has previously been used to provide information on chemical reactivity, for cycloaddition reactions in particular, is distortion/interaction-activation strain analysis (DIAS). We demonstrate that ML can be coupled with cheap and rapid semi-empirical quantum mechanical methods (SQM) to predict distortion and interaction energies at a fraction of the computational cost associated with running density functional theory (DFT) calculations. This dataset includes all the structural data in the form of Gaussian16 (Revision A.03 and C.01) output files for the four datasets used in this work and, the literature dataset reactions.

Keywords:
Machine Learning, Gaussian, Distortion, Interaction, Activation, Strain, Reaction Barrier, Computational Chemistry, Diels-Alder, Michael Addition, Cycloadditions
Subjects:
Chemical reaction dynamics and mechanisms

Cite this dataset as:
Espley, S., Allsop, S., 2024. Dataset for "Distortion/Interaction Analysis via Machine Learning". Bath: University of Bath Research Data Archive. Available from: https://doi.org/10.15125/BATH-01398.

Export

[QR code for this page]

Data

data_archive.zip
application/zip (4GB)
Creative Commons: Attribution 4.0

A zipped directory containing distortion/interaction calculations for four datasets: nitro-Michael addition (MA), Diels-Alder, [3+2] cycloaddition, and dimethyl malonate MA. These calculations have been performed at both AM1 and the DFT level of theory of the original dataset. For the dimethyl malonate MA dataset, the reactant and transition structure geometries are also provided. These calculations were performed at AM1 and wB97X-D/def2-TZVP (IEFPCM=Water)//wB97X-D/def2-TZVP.

GitHub Link

Creators

Sam Espley
University of Bath

Sam Allsop
University of Bath

Contributors

David Buttar
Supervisor
AstraZeneca

Simone Tomasi
Supervisor
AstraZeneca

Matthew Grayson
Supervisor
University of Bath

University of Bath
Rights Holder

AstraZeneca
Rights Holder

Documentation

Data collection method:

Ground state reactant and transition state geometries for dimethyl malonate Michael addition reactions were built using Schrödinger’s R-Group Enumeration. R-groups were placed on various different positions of the Michael acceptor; the position depended upon the molecules in question. All structures were built in Gaussian16 (Revisions A.03 and C.01) and were conformationally searched using Schrödinger’s MacroModel (version 12.7). All structures were subsequently optimised using Gaussian16 (Revisions A.03 and C.01) using AM1 and wB97X-D/def2-TZVP (IEFPCM=Water)//wB97X-D/def2-TZVP. For distortion/interaction-activation strain calculations, python code (available on the associated GitHub page: https://github.com/the-grayson-group/distortion-interaction_ML) was used to separate the distorted reactant structures before single point energies were calculated using Gaussian16 (Revision C.01) using AM1 and the DFT level of theory used in the original transition structure calculation.

Funders

Engineering and Physical Sciences Research Council (EPSRC)
https://doi.org/10.13039/501100000266

Industrial CASE Account - University of Bath 2020
EP/V519637/1

Engineering and Physical Sciences Research Council (EPSRC)
https://doi.org/10.13039/501100000266

Machine Learning and Molecular Modelling: A Synergistic Approach to Rapid Reactivity Prediction
EP/W003724/1

Publication details

Publication date: 21 October 2024
by: University of Bath

Version: 1

DOI: https://doi.org/10.15125/BATH-01398

URL for this record: https://researchdata.bath.ac.uk/id/eprint/1398

Related papers and books

Espley, S. G., Allsop, S. S., Buttar, D., Tomasi, S., and Grayson, M. N., 2024. Distortion/interaction analysis via machine learning. Digital Discovery, 3(12), 2479-2486. Available from: https://doi.org/10.1039/d4dd00224e.

Related datasets and code

Espley, S., and Farrar, E., 2023. Dataset for "Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach". Version 1. Bath: University of Bath Research Data Archive. Available from: https://doi.org/10.15125/BATH-01229.

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Sam Espley

Departments:

Faculty of Science
Chemistry