Dataset for "Machine learning and semi-empirical calculations: A synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction"

All computed chemical structures used to build machine learning models to predict high-level chemical reaction barriers using low-level inputs.

Modern quantum mechanical modelling methods, such as Density Functional Theory (DFT), have provided detailed mechanistic insights into countless reactions and have been used in the design of a handful of chemical transformations. However, their computational cost inhibits their ability to rapidly screen large numbers of substrates and catalysts in reaction discovery. For a C-C bond forming Nitro-Michael addition, we introduce a synergistic semi-empirical quantum mechanical (SQM) and machine learning (ML) approach that achieves the fast and accurate prediction of DFT-quality free energy activation barriers using purely SQM-derived data. This dataset includes all the structural data, in the form of Gaussian16 (Revision A.03) output files, for the Nitro-Michael reaction used for this machine learning analysis.

Gaussian, Computational Chemistry, Machine Learning, Michael Additions
Chemical reaction dynamics and mechanisms
Chemical synthesis

Cite this dataset as:
Farrar, E., Grayson, M., 2022. Dataset for "Machine learning and semi-empirical calculations: A synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction". Bath: University of Bath Research Data Archive. Available from:


[QR code for this page]

application/zip (1GB)
Creative Commons: Attribution 4.0

Zip file containing optimised structures for each level of theory (AM1, AM1-IEFPCM, PM6, PM6-IEFPCM, UFF, wB97XD, wB97XD-IEFPCM). With the exception of IEFPCM folders, each level of theory includes the nucleophile (nuc), 1037 Michael acceptors (gs), and 1037 Michael addition transition states for the reaction of the nucleophile with each Michael acceptor (ts), complete with IEFPCM(toluene) single point energy calculations. IEFPCM folders include only the 37 literature reactions.



University of Bath
Rights Holder


Data collection method:

Uncatalysed reactant and transition state geometries for 1000 Nitro-Michael addition reactions were built using Schrödinger’s R-Group Enumeration by varying at four positions of a generic Michael acceptor core with common organic fragments. In addition to the nucleophile, uncatalysed reactant and transition state geometries for a further 37 biologically important Nitro-Michael addition reactions from literature were built in Gaussian16 (Revision A.03). All reactant and transition state structures were conformationally searched using Schrödinger’s MacroModel (version 12.7). All structures were subsequently optimised using Gaussian16 (Revision A.03) using several different molecular modelling methods.


Engineering and Physical Sciences Research Council (EPSRC)

DTP 2016-2017 University of Bath

Publication details

Publication date: 14 June 2022
by: University of Bath

Version: 1


URL for this record:

Related papers and books

Farrar, E. H. E., and Grayson, M. N., 2022. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chemical Science, 13(25), 7594-7603. Available from:

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Elliot H E Farrar


Faculty of Science