Describing your data

Writing documentation to accompany research data is a vital step in managing it correctly. It can also be one of the hardest, but it can be made much simpler if you are able to work out in advance what you will want to say.

At the beginning of your project, think about the sorts of documentation you will need:

You may find it helpful to make a list of what you will need to record, and fill out the corresponding information as and when you know it, so you have it all in place when you are eventually asked for it. Indeed, some information is much easier to collect at the time you are working with the data than after the fact.

Choosing what information to include

When documenting your data, the aim is to provide enough information so that a fellow researcher who is familiar with your field, but not necessarily your work within it, should be able to understand the data, interpret them correctly, and use them in new research. You may find it easier to consider what you would need to know in order to use someone else's data in your research. Typically this will include the method used to collect the data and how they have been recorded, structured, processed or manipulated. You may also need to provide some broader context to explain the motivation for the design decisions you have taken and the significance of what you found.

More specifically, you may need to include some of the following elements:

You may be recording some of this information in a lab notebook or research journal. If so, you may find it convenient to maintain an index file that links data files to the corresponding page numbers until you have an opportunity to transfer the information into a documentation file.

Choosing where to record the information

Depending on the context there are several places where the documentation can be placed:

Writing a readme file

A readme file is a plain text file that is named 'readme' to encourage users it to read it before looking at the remainder of the content. It can contain documentation directly or instruct the reader where to look to find more information. Even though it is free text, the file should be structured into sections as an aid to the reader. The following are suggestions for what to include:

The University of Bath Research Data Archive contains some examples of readme files you can look at for inspiration:

The University of Minnesota provides an example readme file template.

Writing a structured metadata file

Metadata is the information someone would need concerning some data in order to perform a specific task with them, such as discover them or preserve them. Metadata are most useful when they have been structured, that is, arranged as properties and values.

As a researcher, the main three types of metadata you will be asked to provide are contextual metadata, discovery metadata, and metadata for reuse.

You will provide contextual metadata when you create a record of a dataset in Pure. This helps to connect your data to your own profile, and to your project, funding body and publications.

You will provide discovery metadata when you complete a record in the University of Bath Research Data Archive. This helps other researchers to find your data, and as a result may help to increase the impact of your research.

The metadata you provide for reuse will depend on the field of your research:

If you decide or are required to offer your data to a subject-specific data centre, you should contact them in the early stages of your project to discuss their metadata requirements. This can save a lot of additional work later on as some metadata can only be collected accurately at the point of data creation.

For more information about the data and metadata standards available for your subject area, see the following directories:

Using standardised vocabularies

As an aid to clarity, some subject areas have agreed on a common set of terminology to use when describing data. If metadata standards list the properties that need to be known, vocabularies help with providing useful values.

Further information about describing data