Appraising and selecting data

Research data come in all shapes and sizes, and the value of a particular type of data can vary between disciplines, researchers and projects. For any given circumstance, though, it is important to distinguish between incidental data and those that have long-term value, as keeping everything can have negative consequences:

In order to determine which data should be archived and which should be discarded, there are a number of factors you should consider.

Selecting data for retention

All data supporting a publication, including both quantitative and qualitative data, should be retained and made accessible where possible. These data should be sufficient to enable other researchers to reproduce or validate your findings. This is often a sub-set of the data generated over the course of a research project.

As well as preserving the data that support publications, you should also archive data with acknowledged long-term value. Determining the potential value of a dataset is a matter of judgement, but there are several areas that should be considered.

Reproducibility

If it would be impossible or unreasonably expensive to reproduce or recreate the data, they should be retained. Examples of data which can fall into this category include the following:

As a rule of thumb, if a significant proportion of the funding for a project is for data generation, then the data are probably valuable and should be retained.

Importance

It is not always clear whether data have historical or future significance when selecting data for retention. Indicators that data might be valuable because of their importance include the following:

Persistence

If the data are re-used regularly by your group or by others in your field, they should be archived to ensure that authoritative versions are clearly defined and that they are available for future users. Examples include the following:

Some data are required by law to be held for a period of time.

Confidentiality

If the data are confidential, they should not be shared, but it may be valuable to retain them. Personal data collected for research purposes can be retained for future analysis, and there is an exemption in the Data Protection Act for this purpose. However, if no future analysis is planned, then the data should be securely disposed of. Some funders require you to keep personal data for a fixed period of time, before securely destroying them.

You must keep consent forms for as long as data are retained in a form where they could be linked to individuals. Once the data only exist in anonymised form, we recommend that you retain the wording of the consent forms (e.g. a blank form) but dispose of the completed forms.

If collaboration agreements include data retention and sharing permissions, they should be retained while you still hold the data.

Selecting data for disposal

Not all data are suitable for retaining after the end of a project.

Reproducibility

If the data are easily and cheaply reproduced, you may not need to retain them. Likewise, data which are so large that the cost of retaining them outweighs that of re-creating them should not be kept.

If you dispose of this kind of data, it is important to ensure you have a full and clearly described method for how to recreate the data.

Importance

If the data are superfluous or unsuitable for supporting published findings, then they should not be retained. Examples include the following:

Third-party data

If your research data have been obtained from a third-party, and there are no provisions in the licence or agreement for retaining or sharing the data, then they should be securely destroyed at the end of the project. You may be able to share derived datasets, such as those created through content mining.

Format Transfer

Some data are not suitable for retaining in their original formats, or might be more useful in another format. Some examples include the following:

For more information about digitisation, see our guides to working with non-digital data and archiving or disposing of non-digital data.

The Data Protection Act requires that personal data should be securely disposed of once they are no longer required for the purpose for which they were collected. As mentioned above, however, personal data collected for research purposes can be retained if future analysis is felt to be likely.

Funding agreements, consent forms or collaboration agreements may contain additional stipulations about when sensitive data should be securely destroyed.

Next steps

You should register any data you have selected to keep in Pure. This ensures that we know that the dataset exists within the institution, and is important for compliance with several funders' data policies.

Example case of appraising and selecting data

Researchers in the Department of Architecture and Civil Engineering published a paper describing how the reaction of a chemical compound to exposure was studied both experimentally and through computer simulation. The underlying data consisted of scanning electron microscopy (SEM) images, specimen composition data obtained from energy-dispersive X-ray spectroscopy (EDS), X-ray photoelectron spectroscopy (XPS) results and PHREEQC simulation data.

The journal could accommodate a limited number of images and tabular data within the paper, but would not provide storage for supplementary data. The researchers therefore had to decide which data to submit the University of Bath Research Data Archive to comply with the EPSRC's expectations. Considering the nature of the techniques used, they judged that the data might be useful for validating their findings but would not be useful as the basis for future research. This influenced the selections they made:

Further information about appraisal and selection