# DFT dataset: Impact of Anionic Vacancies on the Local and Electronic Structures of Iron-based Oxyfluoride Electrodes Benjamin J. Morgan ORCID: [0000-0002-3056-8233](https://orcid.org/0000-0002-3056-8233) This dataset contains DFT data and analysis scripts used to generate results supporting the paper “Impact of Anionc Vacancies on the Local and Electronic Structures of Iron-based Oxyfluoride Electrodes”: [DOI](https://dx.doi.org/10.1021/acs.jpclett.8b03503). The dataset includes inputs and outputs for a series of VASP calculations on anion-substituted hexagonal-tungsten-bronze structured FeF3, and scripts for processing this DFT data to produce the related manuscript figures. ## Contents - `vasp_data`: Inputs and outputs for VASP calculations. See the [README](vasp_data/README.md) for more details. - `scripts`: Python scripts used in the data processing and plotting. - `collected_data`: A set of symbolic links to the key output data in the full VASP dataset. These files are generated by the data processing workflow. - `figures`: Plotted data. ## Dependencies The data processing depends on the following packages: - snakemake - vasppy - pymatgen To install these packages using `pip`, run ``` pip install -r requirements.txt ``` Packages can be installed using the “frozen” versions used in for the published dataset, via ``` pip install -r frozen_requirements.txt ``` The figure plotting uses LaTeX (via matplotlib), which will need to be installed on your system to reproduce the full data processing and analysis workflow. ## Data processing workflow Data processing is managed using [`snakemake`](https://snakemake.readthedocs.io), and uses the Python scripts in the `scripts` directory. The full data analysis workflow can be rerun with: ``` snakemake clean snakemake -j ``` This workflow consists of two steps: 1. Data collation: All the VASP output files needed to subsequent plotting are copied to `collected_data` as symbolic links. This uses the `vasp_summary` script from the [`vasppy`](https://pypi.org/project/vasppy/) package. This step generates: - `vasp_summary.yaml`: A YAML file describing key input parameters and metadata for all the VASP calculations. This file includes md5 checksums for all the files copied to the `collected_data` directory that are used in the subsequent step. - `collected_data/*`: A set of symbolic links to the VASP output files used in subsequent plotting. - `system_information.txt`: System information details, generated at the time of running the analysis workflow. This file includes information about platform architecture, the Python interpreter used for running the analysis scripts, and a list of installed Python modules and version numbers. 2. Figure plotting: This step generates a series of figures showing plots of calculated projected densitites of states (pDOS) and energy dependent absorption coefficients. The figure plotting step can be run without reprocessing the full DFT dataset with: ``` snakemake -s plotting.smk clean snakemake -s plotting.smk ``` The volume-energy fits performed for each system to the Murnaghan equation of state can be rerun using the `murn.snake` snakefile: ``` snakemake -s murn.smk clean snakemake -s murn.smk ```