At a broad level, data are items of recorded information considered collectively for reference or analysis. Data can occur in a variety of formats that include, but are not limited to,
- notebooks
- survey responses
- software and code
- measurements from laboratory or field equipment (such as IR spectra or hygrothermograph charts)
- images (such as photographs, films, scans, or autoradiograms)
- audio recordings
- physical samples
Research data management (or RDM) is a term that describes the organization, storage, preservation, and sharing of data collected and used from its entry to the research cycle through to the dissemination and archiving of valuable results. It involves the everyday management of research data during the lifetime of a research project (for example, using consistent file naming conventions). It also involves decisions about how data will be preserved and shared after the project is completed (for example, depositing the data in a repository for long-term archiving and access). Research Data Management is part of the research process, and aims to make the research process as efficient as possible, and meet expectations and requirements of the university, research funders, and legislation.
It concerns how you:
- Create data and plan for its use,
- Organise, structure, and name data,
- Keep it – make it secure, provide access, store and back it up,
- Find information resources, and share with collaborators and more broadly, publish and get cited.
There are a host of reasons why research data management is important:
Researchers can personally benefit from good practice in research data management whether it be helping to navigate through required processes, protecting their intellectual property, being able to locate and accurately distinguish between files/datasets, keep them secure and share them with collaborators, or improve the opportunities to collaborate, be published and cited, and to be given the opportunity to carry out more research.
Equally pressures are growing on researchers and institutions with greater oversight of the research process and demands for evidence of research integrity, the principle of data as a public good being a driver. Legislative and regulatory demands in the area of both disclosure (Freedom of information) and confidentiality present significant demands, as do the range of funding body data policies.
- Data, like journal articles and books, is a scholarly product.
- Risks of data loss. Data (especially digital data) is fragile and easily lost.
- Non-repeatability of research e.g. weather observational measurements
- Institutional reputational risk – can you demonstrate research verification/validation/integrity
- Need to repeat work if you can’t make sense of it if it is not documented effectively
- ‘Big data’ – enable re-use
- Just as part of good practice – to share, cite, re-use
- There are growing research data requirements imposed by funders and publishers.
- Institutional reputational and funding risk if there is no infrastructure and/or poor practice
- So it is not hard to find data and combine with other’s data
- Identify versions of data
- To enable sharing. Well-managed and accessible data allows others to validate and replicate findings. Research data management facilitates sharing of research data and, when shared, data can lead to valuable discoveries by others outside of the original research team.
- Citation impact if made available – get credit for your work
- Demonstrate value for funding and likelihood of further funding
- Enable collaboration
- Research data management saves time and resources in the long run.
- Good management helps to prevent errors and increases the quality of your analyses.
You need to think about data management as early as possible and throughout the research lifecycle. Data management is not a single task to be ticked off at any particular part of the research process, and is integral to the process of conducting research.
An important first step in managing your research data is planning. To get you started thinking about data management planning, here are some of the issues you need to consider:
- Your institution's and funding agency's expectations and policies
- Whether you collect new data or reuse existing data
- The kind of data collected and its format
- The quantity of data collected
- Whether versions of the data need to be tracked
- Storage of active data and backup policy and implementation
- Storage and archiving options and requirements
- Organizing and describing or labeling the data
- Data access and sharing
- Privacy, consent, intellectual property, and security issues
- Roles and responsibilities for data management on your research team
- Budgeting for data management
In addition to the term ‘Research Data Management’ there is an increasing use of a further term ‘Digital Curation’ which researchers should become familiar with.
"Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets.
Digital curation is generally referred to the process of establishing and developing long term repositories of digital assets for current and future reference by researchers, scientists, historians, and scholars. Enterprises are starting to utilize digital curation to improve the quality of information and data within their operational and strategic processes." Wikipedia
The digital curation lifecycle
Digital curation and data preservation are ongoing processes, requiring considerable thought and the investment of adequate time and resources. You must be aware of, and undertake, actions to promote curation and preservation throughout the data lifecycle.
The digital curation lifecycle comprises the following steps:
- Conceptualise: conceive and plan the creation of digital objects, including data capture methods and storage options.
- Create: produce digital objects and assign administrative, descriptive, structural and technical archival metadata.
- Access and use: ensure that designated users can easily access digital objects on a day-to-day basis. Some digital objects may be publicly available, whilst others may be password protected.
- Appraise and select: evaluate digital objects and select those requiring long-term curation and preservation. Adhere to documented guidance, policies and legal requirements.
- Dispose: rid systems of digital objects not selected for long-term curation and preservation. Documented guidance, policies and legal requirements may require the secure destruction of these objects.
- Ingest: transfer digital objects to an archive, trusted digital repository, data centre or similar, again adhering to documented guidance, policies and legal requirements.
- Preservation action: undertake actions to ensure the long-term preservation and retention of the authoritative nature of digital objects.
- Reappraise: return digital objects that fail validation procedures for further appraisal and reselection.
- Store: keep the data in a secure manner as outlined by relevant standards.
- Access and reuse: ensure that data are accessible to designated users for first time use and reuse. Some material may be publicly available, whilst other data may be password protected.
- Transform: create new digital objects from the original, for example, by migration into a different form.
Digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle. The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research.