# Computational Supporting Dataset: Atomic Insights into Aluminium-Ion Insertion in Defective Hydroxyfluorinated Anatase for Batteries

Benjamin J. Morgan (ORCID: [0000-0002-3056-8233](https://orcid.org/0000-0002-3056-8233))

This dataset contains DFT data and analysis scripts used to generate results supporting the paper &ldquo;Atomic Insights into Aluminium-Ion Insertion in Defective Hydroxyfluorinated Anatase for Batteries&rdquo;: [DOI TODO]().

The dataset contains inputs and outputs for a series of VASP calculations on Al-intercalated F/OH-doped anatase TiO<sub>2</sub>, and scripts for processing this DFT data to produce the related manuscript figures.

To run the main data analysis, run the following commands from the top level directory:
```
pip install -r requirements.txt
snakemake clean
snakemake
```

## Overview
This dataset contains data and analysis for three pieces of supporting data:
1. Al intercalation energies for insertion into a range of doped-anatase TiO<sub>2</sub> structures and insertion sites.
2. Optimised geometries of singly- and doubly-Al-inserted \[2<i>V</i><sub>Ti</sub> + 8F<sub>O</sub>\] paired Ti vacancy sites.
3. Coordination number analysis for Al inserted at interstitial sites <i>adjacent</i> to a single \[<i>V</i><sub>Ti</sub> + 4(F/OH)<sub>O</sub>\] site.

## Contents:

- `vasp_calculations`: Inputs and outputs for VASP calculations. See the [README](vasp_calculations/README.md) for more details.
- `scripts`: Python scripts used in the data processing and plotting.
- `collected_data`: A set of symbolic links to the key output data in the full VASP dataset. These files are generated by the data processing workflow.
- `figures`: Plotted data.
- `analysis`: Jupyter notebooks containing thermodynamic calculations of Al intercalation energies, and analysis of Al coordination numbers.

## Dependencies

The data processing depends on the following Python packages:
- vasppy >= 0.6.1.0
- snakemake
- numpy
- version_information
- matplotlib
- pymatgen
- tqdm
- jupyter

To install these using `pip` run
```
pip install -r requirements.txt
```

Packages can be installed using the &ldquo;frozen&rdquo; versions that were used for the as-published dataset via
```
pip install -r frozen_requirements.txt
```

Full system information and Python environment used to generate the as-published analysis dataset is listed in [`system_info.txt`](system_info.txt).

## Data processing workflow

Data processing is mostly managed using [`snakemake`](https://snakemake.readthedocs.io), and uses the Jupyter notebooks in the `analysis` directory, and the Python scripts in the `scripts` directory. The full data analysis workflow can be rerun with:
```
snakemake --cores all clean
snakemake --cores all
```

This workflow consists of two steps:
1. Data collation:
All the VASP output files needed used in subsequent structure analysis are copied to `collected_data` as symbolic links. This uses the `vasp_summary` script from the [`vasppy`](https://pypi.org/project/vasppy/) package.
This step generates:
- `vasp_summary.yaml`: A YAML file describing key input parameters and metadata for all the VASP calculations. This file includes md5 checksums for all the files copied to the `collected_data` directory that are used in the subsequent step.
- `collected_data/*`: A set of symbolic links to the VASP output files used in subsequent plotting.
- `system_information.txt`: System information details, generated at the time of running the analysis workflow. This file includes information about platform architecture, the Python interpreter used for running the analysis scripts, and a list of installed Python modules and version numbers.

2. Figure plotting:
This step generates a series of figures showing plots of thermodynamic intercalation products for a range of lithium and magnesium electrochemical potentials. The figure plotting step can be run without reprocessing the full DFT dataset with:
```
snakemake --cores all -s plotting.smk clean
snakemake --cores all -s plotting.smk
```