Organising your data files

Good file and folder organisation will help you to locate, identify and retrieve your data quickly and accurately, thereby making it easier your you to manage your data. To achieve this, you need to do at least two things:

You should establish a file organisation scheme at the start of your project, to avoid having to apply one retrospectively:

Once you have set up a file organisation scheme, you should document it: write down what should go in each folder and the naming conventions you are using, along with any codes or abbreviations you are using. Save it in a ‘readme’ file, preferably in plain text, and store it in the top level folder for your project where you (or anyone in your group) will be able to access it easily.

Consider scheduling a regular review of your file organisation scheme:

Although these principles are aimed at digital files and folders, it is just as important to organise physical files, folders and other materials in a meaningful, consistent and documented fashion.

Structuring files and folders

There are many "right" ways of organising your files so think about what makes sense for your research.

If you are doing experimental work, for example, you might want to organise the results into folders by the date you did the experiment, or by a key experimental condition.

The following suggestions will help you to organise your data:

Unlike with physical records, it is possible for digital folders to appear in more than one place in the hierarchy by means of shortcut links. This can help if different members of a group need the files to be organised in different ways, but the technique should be used sparingly.

Further guidance on structuring files and folders

You may find the following external guidance useful.

Naming files and folders

Naming conventions are rules that allow electronic and physical records to be named in a consistent and logical way.

Use of consistent and meaningful names will enable you to identify and distinguish between similar records, making data retrieval easier.

If you create large numbers of data files that would be difficult to rename individually, apply your naming convention at the folder level instead.

When you agree your naming convention, consider the following suggestions:

If you are likely to have multiple versions of the same file, see our guidance below on keeping track of versions.

Applying a file naming convention

Consider a folder containing the following files:

This is an extreme example with many problems. Dates are written in many different formats, and files are described inconsistently, meaning that related files are not grouped together. This makes it hard to tell at a glance which file is which. In addition, some have quite mysterious names and some are using characters which may cause problems.

Contrast that with the consistently named files below.

Underscores are used to separate blocks of information, while hyphens are used within blocks. Note how materials from the same day and concerning the same subject are grouped together, and that corresponding files in the different groups are easy to spot.

Further guidance on file naming conventions

You may find the following external guidance useful.

Keeping track of versions

As you work with your data it is important to distinguish between different versions or drafts of your files. Version control can help you to easily identify the current version of your data so that you avoid working on older or outdated copies. If you are working with others it can also help to link versions of the data to the time and author of the change.

There are a number of ways that different versions of data can be managed:

File naming: A simple method of version control is to create a duplicate copy and then update version information to create a unique file or folder name.

Version control tables: These are included within documents and can capture more information than using file naming conventions. Version control tables typically include the new version number, date of the change, person who made the change and the nature or purpose of the change.

Version control systems: There are many automated systems available that can store a repository of files and monitor access to them, logging who made what change and when. Version control systems are particularly useful for collaborative development of code or software. Computing Services provide an institutional GitHub service. Please contact Computing Services via the IT help form to arrange access.

Further guidance on version control

You may find the following external guidance useful.

Example case of organising data files

The principal investigator (P.I.) of a large multi-institutional project was faced with the issue that each partner would hold an overlapping subset of the project data, with files shared with other partners as needed. In order to ensure consistency and coordination across the partner institutions, the P.I. drafted a folder structure and file naming convention.

The research workflow would involve taking detailed measurements of samples, and analysing the raw data in a variety of ways to generate different characterisations. The data might therefore need to be accessed by sample, characterisation technique, or characterisation purpose. The raw data would also need to be kept separate from derived data, to protect the raw data from inadvertent changes and permit timely sharing between partners.

The convention chosen was as follows. Within the project folder, subfolders were created for each work package ('WP1', 'WP2', etc.), for raw data ('Raw_Data'), and for characterisation templates ('Templates'):

We acknowledge the work of the UK Data Service, the University of Glasgow, the University of Leicester and the University of Southampton in the development of this guidance.