<?xml version='1.0' encoding='utf-8'?>
<eprints xmlns='http://eprints.org/ep2/data/2.0'>
  <eprint id='https://researchdata.bath.ac.uk/id/eprint/1229'>
    <eprintid>1229</eprintid>
    <rev_number>79</rev_number>
    <documents>
      <document id='https://researchdata.bath.ac.uk/id/document/16984'>
        <docid>16984</docid>
        <rev_number>3</rev_number>
        <files>
          <file id='https://researchdata.bath.ac.uk/id/file/59688'>
            <fileid>59688</fileid>
            <datasetid>document</datasetid>
            <objectid>16984</objectid>
            <filename>data_archive_3_2.zip</filename>
            <mime_type>application/zip</mime_type>
            <hash>b2ec72fd43b20b86cde3473311ad096d</hash>
            <hash_type>MD5</hash_type>
            <filesize>175353643</filesize>
            <mtime>2023-04-03 14:05:55</mtime>
            <url>https://researchdata.bath.ac.uk/1229/4/data_archive_3_2.zip</url>
          </file>
        </files>
        <eprintid>1229</eprintid>
        <pos>4</pos>
        <placement>4</placement>
        <mime_type>application/zip</mime_type>
        <format>other</format>
        <formatdesc>Zip folder containing optimised ground state reactant (1,3-dipoles and dipolarophiles) and transition state structures for [3+2] cycloaddition reactions. These calculations were performed at two levels of theory (AM1 and wB97X-D/def2-TZVP).</formatdesc>
        <language>en</language>
        <security>public</security>
        <license>cc_by</license>
        <main>data_archive_3_2.zip</main>
        <content>data</content>
      </document>
      <document id='https://researchdata.bath.ac.uk/id/document/16985'>
        <docid>16985</docid>
        <rev_number>2</rev_number>
        <files>
          <file id='https://researchdata.bath.ac.uk/id/file/59694'>
            <fileid>59694</fileid>
            <datasetid>document</datasetid>
            <objectid>16985</objectid>
            <filename>example_code.zip</filename>
            <mime_type>application/zip</mime_type>
            <hash>4669cb0253c5f84029273e5bbe3e1f4d</hash>
            <hash_type>MD5</hash_type>
            <filesize>571922</filesize>
            <mtime>2023-04-03 14:06:48</mtime>
            <url>https://researchdata.bath.ac.uk/1229/5/example_code.zip</url>
          </file>
        </files>
        <eprintid>1229</eprintid>
        <pos>5</pos>
        <placement>5</placement>
        <mime_type>application/zip</mime_type>
        <format>other</format>
        <formatdesc>Zip folder containing exemplar code to perform the machine learning in the paper. The folder contains the example.ipynb, endo_dataset.pkl (the extracted data in .pkl file format), environment.yaml (the .yaml file to generate the environment) and, a README.md file.</formatdesc>
        <language>en</language>
        <security>public</security>
        <license>cc_by</license>
        <main>example_code.zip</main>
        <content>code</content>
      </document>
      <document id='https://researchdata.bath.ac.uk/id/document/17105'>
        <docid>17105</docid>
        <rev_number>2</rev_number>
        <files>
          <file id='https://researchdata.bath.ac.uk/id/file/60350'>
            <fileid>60350</fileid>
            <datasetid>document</datasetid>
            <objectid>17105</objectid>
            <filename>data_archive.zip</filename>
            <mime_type>application/zip</mime_type>
            <hash>22bd574f37d6e69451a9f716042c774b</hash>
            <hash_type>MD5</hash_type>
            <filesize>2105491517</filesize>
            <mtime>2023-04-24 12:47:20</mtime>
            <url>https://researchdata.bath.ac.uk/1229/6/data_archive.zip</url>
          </file>
        </files>
        <eprintid>1229</eprintid>
        <pos>6</pos>
        <placement>6</placement>
        <mime_type>application/zip</mime_type>
        <format>other</format>
        <formatdesc>Zip folder containing optimised ground state reactant (dienes and dienophiles) and transition state structures for Diels-Alder reactions. These calculations were performed at multiple levels of theory (AM1, PM3, wB97X-D/def2-TZVP, and DSD-PBEP86-D3(BJ)/def2-TZVP) and are grouped into different enumerations (enum1, enum2, enum3, enum4, and enum5). Included is a dictionaries folder that contains csv files to pair each transition state with its respective diene and dienophile.</formatdesc>
        <language>en</language>
        <security>public</security>
        <license>cc_by</license>
        <main>data_archive.zip</main>
        <content>data</content>
      </document>
    </documents>
    <eprint_status>archive</eprint_status>
    <userid>12231</userid>
    <dir>disk0/00/00/12/29</dir>
    <datestamp>2023-05-31 11:46:04</datestamp>
    <lastmod>2026-02-07 05:54:59</lastmod>
    <status_changed>2023-05-31 11:46:04</status_changed>
    <type>data_collection</type>
    <metadata_visibility>show</metadata_visibility>
    <creators>
      <item>
        <name>
          <family>Espley</family>
          <given>Sam</given>
        </name>
        <id>sge28@bath.ac.uk</id>
        <orcid>0000-0002-1135-9890</orcid>
        <affiliation>University of Bath</affiliation>
        <contact>TRUE</contact>
      </item>
      <item>
        <name>
          <family>Farrar</family>
          <given>Elliot</given>
        </name>
        <id>ehef20@bath.ac.uk</id>
        <orcid>0000-0003-3350-2907</orcid>
        <affiliation>University of Bath</affiliation>
        <contact>FALSE</contact>
      </item>
    </creators>
    <contributors>
      <item>
        <type>Supervisor</type>
        <name>
          <family>Grayson</family>
          <given>Matthew</given>
        </name>
        <id>M.N.Grayson@bath.ac.uk</id>
        <orcid>0000-0003-2116-7929</orcid>
        <affiliation>University of Bath</affiliation>
      </item>
      <item>
        <type>Supervisor</type>
        <name>
          <family>Tomasi</family>
          <given>Simone</given>
        </name>
        <id>simone.tomasi@astrazeneca.com</id>
        <orcid>0000-0002-9373-7639</orcid>
        <affiliation>AstraZeneca</affiliation>
      </item>
      <item>
        <type>Supervisor</type>
        <name>
          <family>Buttar</family>
          <given>David</given>
        </name>
        <id>david.buttar@astrazeneca.com</id>
        <orcid>0000-0001-5466-023X</orcid>
        <affiliation>AstraZeneca</affiliation>
      </item>
    </contributors>
    <title>Dataset for &quot;Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach&quot;</title>
    <subjects>
      <item>CJ0020</item>
    </subjects>
    <divisions>
      <item>dept_chem</item>
    </divisions>
    <keywords>Machine Learning, Transfer Learning, Computational Chemistry, Diels-Alder, Gaussian</keywords>
    <abstract>Machine learning (ML) has previously been used to predict density functional theory (DFT) free energy reaction barriers on a variety of different reactions from semi-empirical quantum mechanical (SQM) inputs. These models can require expensive dataset curation and can struggle with generalisability outside of the datasets immediate chemical space. One such approach that can drastically lower the number of required training points is transfer learning (TL).  We demonstrate that various TL techniques can be used to provide highly accurate results with a fraction of the training points required for standard ML, thus lowering the overall computational cost of barrier predictions. This dataset includes all the structural data in the form of Gaussian16 (Revision A.03 and C.01) output files for the Diels-Alder and [3+2] cycloaddition reactions used for this ML/TL analysis. This data archive also includes exemplar code for performing some standard ML from the manuscript.</abstract>
    <date>2023-05-31</date>
    <publisher>University of Bath</publisher>
    <full_text_status>public</full_text_status>
    <corp_contributors>
      <item>
        <type>RightsHolder</type>
        <corpname>University of Bath</corpname>
      </item>
      <item>
        <type>RightsHolder</type>
        <corpname>AstraZeneca</corpname>
      </item>
    </corp_contributors>
    <funding>
      <item>
        <funder_name>Engineering and Physical Sciences Research Council</funder_name>
        <funder_id>https://doi.org/10.13039/501100000266</funder_id>
        <grant_id>EP/V519637/1</grant_id>
        <project_name>Industrial CASE Account – University of Bath 2020</project_name>
      </item>
      <item>
        <funder_name>Engineering and Physical Sciences Research Council</funder_name>
        <funder_id>https://doi.org/10.13039/501100000266</funder_id>
        <grant_id>EP/R513155/1</grant_id>
        <project_name>DTP 2018-19 University of Bath</project_name>
      </item>
      <item>
        <funder_name>Engineering and Physical Sciences Research Council</funder_name>
        <funder_id>https://doi.org/10.13039/501100000266</funder_id>
        <grant_id>EP/W003724/1</grant_id>
        <project_name>Machine Learning and Molecular Modelling: A Synergistic Approach to Rapid Reactivity Prediction</project_name>
      </item>
    </funding>
    <collection_method>Ground state reactant and transition state geometries for Diels-Alder reactions were built using Schrödinger’s R-Group Enumeration. R-groups were placed on various different positions of both dienes and dienophiles; the position depended upon the molecules in question. All structures were built in Gaussian16 (Revisions A.03 and C.01) and were conformationally searched using Schrödinger’s MacroModel (version 12.7). All structures were subsequently optimised using Gaussian16 (Revisions A.03 and C.01) using three different molecular modelling methods (AM1, PM3, and wB97X-D/def2-TZVP). A subset of the reactions were also optimised with DSD-PBEP86-D3(BJ)/def2-TZVP.
The same process was used for the [3+2] reactions however, these reactions were only optimised at the AM1 and wB97X-D/def2-TZVP levels of theory.</collection_method>
    <language>en</language>
    <version>1</version>
    <doi>10.15125/BATH-01229</doi>
    <related_resources>
      <item>
        <link>https://doi.org/10.1039/D3DD00085K</link>
        <type>pub</type>
      </item>
    </related_resources>
    <equipment>
      <item>
        <name>Balena High Performance Computing (HPC) System</name>
        <id>3e22ef13-31b9-4700-b57e-57bcd3b3b985</id>
      </item>
      <item>
        <name>Anatra High Performance Computing (HPC) System</name>
      </item>
    </equipment>
    <access_types>
      <item>open</item>
    </access_types>
    <resourcetype>
      <general>Dataset</general>
    </resourcetype>
    <parent>1455</parent>
  </eprint>
</eprints>
