# Computational Supporting Dataset: Atomic Insights into Aluminium-Ion Insertion in Defective Hydroxyfluorinated Anatase for Batteries Benjamin J. Morgan (ORCID: [0000-0002-3056-8233](https://orcid.org/0000-0002-3056-8233)) This dataset contains DFT data and analysis scripts used to generate results supporting the paper “Atomic Insights into Aluminium-Ion Insertion in Defective Hydroxyfluorinated Anatase for Batteries”: [DOI TODO](). The dataset contains inputs and outputs for a series of VASP calculations on Al-intercalated F/OH-doped anatase TiO2, and scripts for processing this DFT data to produce the related manuscript figures. To run the main data analysis, run the following commands from the top level directory: ``` pip install -r requirements.txt snakemake clean snakemake ``` ## Overview This dataset contains data and analysis for three pieces of supporting data: 1. Al intercalation energies for insertion into a range of doped-anatase TiO2 structures and insertion sites. 2. Optimised geometries of singly- and doubly-Al-inserted \[2VTi + 8FO\] paired Ti vacancy sites. 3. Coordination number analysis for Al inserted at interstitial sites adjacent to a single \[VTi + 4(F/OH)O\] site. ## Contents: - `vasp_calculations`: Inputs and outputs for VASP calculations. See the [README](vasp_calculations/README.md) for more details. - `scripts`: Python scripts used in the data processing and plotting. - `collected_data`: A set of symbolic links to the key output data in the full VASP dataset. These files are generated by the data processing workflow. - `figures`: Plotted data. - `analysis`: Jupyter notebooks containing thermodynamic calculations of Al intercalation energies, and analysis of Al coordination numbers. ## Dependencies The data processing depends on the following Python packages: - vasppy >= 0.6.1.0 - snakemake - numpy - version_information - matplotlib - pymatgen - tqdm - jupyter To install these using `pip` run ``` pip install -r requirements.txt ``` Packages can be installed using the “frozen” versions that were used for the as-published dataset via ``` pip install -r frozen_requirements.txt ``` Full system information and Python environment used to generate the as-published analysis dataset is listed in [`system_info.txt`](system_info.txt). ## Data processing workflow Data processing is mostly managed using [`snakemake`](https://snakemake.readthedocs.io), and uses the Jupyter notebooks in the `analysis` directory, and the Python scripts in the `scripts` directory. The full data analysis workflow can be rerun with: ``` snakemake --cores all clean snakemake --cores all ``` This workflow consists of two steps: 1. Data collation: All the VASP output files needed used in subsequent structure analysis are copied to `collected_data` as symbolic links. This uses the `vasp_summary` script from the [`vasppy`](https://pypi.org/project/vasppy/) package. This step generates: - `vasp_summary.yaml`: A YAML file describing key input parameters and metadata for all the VASP calculations. This file includes md5 checksums for all the files copied to the `collected_data` directory that are used in the subsequent step. - `collected_data/*`: A set of symbolic links to the VASP output files used in subsequent plotting. - `system_information.txt`: System information details, generated at the time of running the analysis workflow. This file includes information about platform architecture, the Python interpreter used for running the analysis scripts, and a list of installed Python modules and version numbers. 2. Figure plotting: This step generates a series of figures showing plots of thermodynamic intercalation products for a range of lithium and magnesium electrochemical potentials. The figure plotting step can be run without reprocessing the full DFT dataset with: ``` snakemake --cores all -s plotting.smk clean snakemake --cores all -s plotting.smk ```