Preservation Policy

Outline

This policy outlines the principles that underpin the main activities of the University’s Research Data Archive (the Archive) in the active preservation of data for use and re-use. The Archive is in a pilot phase for its digital preservation activities and this policy will be reviewed in 2016 when more is known about requirements.

Purpose

The Archive exists to support high-quality research produced by members of the University by acquiring, developing and managing data and encouraging their re-use. In order to achieve these aims, the primary function of the Archive is to store and preserve data in the long term and to support discovery through the use of standard and enriched metadata.

Scope

The scope of this policy is limited to the Archive’s data collections. It deals with all aspects of preservation and applies to all materials held by the Archive on behalf of the University of Bath and its collaborators.

Objectives

This policy is designed to guide preservation activities towards recognised quality standards. It is intended to provide a framework in which to develop robust preservation processes, while ensuring the safekeeping of datasets deposited in the Archive during the pilot phase.

Requirements

The Archive requires that:

Funding and Resource Planning

The Archive is supported by two permanent, full-time members of staff. The Arkivum appliance and the Archive server are maintained by permanent staff in Computing Services. Storage in the Arkivum service is currently being paid for on a yearly basis during the pilot phase (running until summer 2016), however all data can be withdrawn as part of the agreement we have should that service not be continued. In the event of the Archive being closed down, the database will be transferred to another appropriate archive.

Roles and Responsibilities

The University’s Research Data Archive was established in 2015 to provide a reliable archive for datasets in research areas where specialist external archives do not exist. The Library manages the Archive within the Research Services section, under the responsibilities of the following staff:

Research Data Librarian (Systems)

Develops and maintains the Data Archive, and processes deposits. Develops processes, documentation and guidance for use of the system, and responds to enquiries. Liaises with external providers of related services.

Senior Data Librarian

Works closely with Research Data Librarian (Systems) to ensure the Archive meets institutional and external requirements for data preservation.

Computing Services maintain the servers in accordance with their policies.

Functions

These functions are broadly based on the OAIS functions, as expanded by the UKDA.

Pre-ingest function

Training and skills development of depositors is at the core of current preservation work, and is a well-established service at the University. Key activities include:

All deposits in the Archive are mediated and reviewed manually.

Ingest function

Depositors create a record in Pure to register a Dataset. This is imported into the Archive and used as a basis for the full data deposit. Files and full metadata are input by the depositor and checked by the Archive staff for sufficiency and organisation. Where data files are in a common, non-specialist format they are visually inspected to ensure the contents have not become corrupted during ingest. Data are not checked for accuracy, anonymity or any other content, and the responsibility for the content of the files remains with the depositor.

Copies of files may be made at this point for preservation purposes. Typically, this would involve creating unzipped copies of files or converting proprietary formats to more universal formats. This is not yet consistent and the pilot process will endeavour to identify where this should normally be carried out.

Once published, files are copied into the external preservation service, Arkivum.

Data Management function

New versions are required when changes to the data or mandatory DataCite metadata are required (if permission to change it has not been agreed). The software provides links between different versions. Data and metadata deletion is only possible where a DOI has not been created for a dataset. Otherwise, at minimum a metadata record with mandatory DataCite metadata is maintained, with a statement about the withdrawal of the dataset from the collection.

Archival storage function

Archival storage is outsourced to Arkivum.

Files are preserved as follows:

  1. A copy of the file deposited in EPrints is copied to the Arkivum Appliance located on site.
  2. At volume-based trigger points, the files are replicated from the Appliance to the Arkivum data centres.
  3. Once these have been verified as accurate copies by Arkivum, escrow copies are made automatically through the Arkivum service.
  4. On completion of the process, the files are deleted from EPrints local storage, and bit-level preservation is guaranteed, backed with insurance. The Appliance retains copies of recently accessed files in its cache.

Access function

Metadata about datasets is normally published, and where this is true, metadata are released under a CC0 licence to encourage re-use. Metadata are mapped to the DataCite metadata schema v3.1 and users can export record information in that format.

Where a DOI has been created, metadata are also exported to the DataCite Metadata Store.

Information about access to the dataset is included with each record, and is part of the record validation process. End users can use the request access button to contact the Archive regarding access to restricted data.

Preservation planning function

Data files are not routinely migrated to other formats, and the commitment is only to bit-level preservation of datasets. This is an area for future review reflecting the current capacity of the service, and is part of the pilot process to assess future demands. Bit-level preservation is outsourced to Arkivum. Metadata are routinely backed up by Computing Services. Preservation copies of files are made in addition to the original copies of the files unless the conversion has been made before deposit and agreed as the correct version with the depositor. For audit purposes, the Archive software records all changes made to records.

Administration function

Administrative metadata are maintained for each record, and remain visible to Archive staff only. The Archive records information about retention and curation decisions as part of the administrative metadata for each item. For retention, this includes recording the length of time before review, the date from which this period of time should run, and the reason for the retention period being set. For curation, this includes recording the date of the decision, the type of decision and a reason for its being made. A free-text additional information field records any administrative information received about the dataset to aid in future decision-making processes.

Monitoring and review

The Archive Preservation Policy will be reviewed in summer 2016 as part of the pilot process for the Archive.

Feedback

Queries concerning the Archive should be directed to the Library Research Data team:

Email: research-data@bath.ac.uk

Post: Research Services, The Library, University of Bath, Bath BA2 7AY