Data Sharing Requirements
What are the requirements for the Open Access Policy source data?
The Open Access Policy requires that the data underlying published research results will be accessible and open immediately.
The following overview focuses on data underlying published research. Activities around pre-publication data planning, collection, analysis, storage, sovereignty, informed consent, interoperability, and the use of disciplinary standards are addressed at the individual grant and contract level by the grantee and program officer during the proposal stage.
More information about data you need to include, where your data can be stored, and how your data should be presented is available on the Gates Open Research website.
These guidelines are aligned with existing industry best practices, including data availability policies required by several publishers. As new practices emerge, the guidelines will be reviewed and updated as needed. Any questions or suggestions to these guidelines should be sent to [email protected].
What is underlying data?
Underlying data encompasses all primary data, associated metadata, and any additional relevant data necessary to understand, assess, and replicate the reported study findings in totality.
Underlying data can be compiled into any file type, including any necessary access instructions, code, or supporting information files, to ensure the file(s) can be accessed and used by others.
Note: We do not require sharing of data that is ethically unsound or legally encumbered.
Why is access to underlying data important?
Providing access to underlying data is key in fulfilling the foundation’s mission of rapid and free exchange of scientific ideas to move humanity forward by improving and saving lives. Without barriers the scientific community can freely benefit from data and build upon each other’s work.
Access to underlying data allows for:
- Barrier-free and timely access to data
- Reassessment of current data interpretations and analysis
- Ability to verify, reproduce, and reuse data in new ways
- Data provenance and preservation
What are Data Availability Statements?
The foundation’s updated policy requires that grantees make publicly available all underlying data necessary to replicate the published findings at the time of publication. When specific legal or ethical restrictions prohibit public sharing of a data set, grantees must indicate how others may obtain access to the data.
All articles must include a Data Availability statement, even where there is no data associated with the article. This statement should be added to the end of the manuscript prior to submission. The Data Availability statement should not refer readers or reviewers to contact an author to obtain the data, but should instead include the applicable details where the underlying data can be found. These Data Availability Statements should also be published as part of the final article.
These publishers have detailed requirements for underlying data and guidelines for data availability statements:
How should grantees make data accessible and open?
The repository you choose should:
- Enable immediate open access to the underlying data upon publication of your article.
- Allow funder acknowledgment
- Allow reuse with licensing no more restrictive than Creative Commons Attribution 4.0 Generic License (CC BY 4.0).
- Assign your dataset with a persistent and unique identifier, such as a DOI (digital object identifier), to facilitate linking and citation.
- Provide long-term storage and preservation, such as those that meet the ISO’s trustworthy digital repository standards.
Where should grantees deposit data?
Best practice: Deposit data in a repository already established for your research domain according to the recognized standards of your discipline. Required or suggested repositories are often identified within a journal’s author guidelines.
Gates Open Research Data Guidelines provide submission options based on data type.
For further suggestions, see:
- Re3data.org’s Registry of Research Data Repositories
- PLOS’ Recommended Data Repositories
- Scientific Data’s Recommended Data Repositories
When no established repository is available: Deposit data in your institutional research data repository or in a generalist repository, such as:
- Zenodo – a repository developed and hosted by CERN that enables researchers to share and preserve research outputs in any size, any format, and from any science.
- Dataverse – an open source web application developed by Harvard University to share, preserve, cite, explore, and analyze research data.
- Dryad – a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable.
- Figshare – a repository where users can make all of their research outputs available in a citable, shareable, and discoverable manner.
What is FAIR data?
The FAIR data principles are guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. These principles emphasize machine-actionability as humans increasingly rely on computational support to manage data due to rapid increases in generated volume and complexity.
The FAIR data principles:
- Support knowledge discovery and innovation
- Support data and knowledge integration
- Promote sharing and reuse of data
- Are discipline-independent and allow for differences in discipline
- Help data and metadata to be ‘machine readable’, supporting new discoveries through the harvest and analysis of multiple datasets.
How do FAIR and Open Data differ?
FAIR data stresses that data must be retrievable without specialized or proprietary tools or communication methods, and that data should be released with a clear and accessible usage license. Individuals and organizations that put FAIR data principles into practice may do so under a variety of data usage licenses. In other words, FAIR does not necessarily imply Open; data can be FAIR and shared under restrictions.
There are many valid reasons to restrict data access, including: data that contains personal information, cases where consent has not been given for release, confidential commercial information, or situations where there are sound public good reasons for restricting data (e.g. protection of endangered species, archaeological sites or aspects of national security). The use of anonymization techniques, data sharing agreements, and safe havens where data can be accessed in controlled and secure circumstances are key in such cases.
That being said, the greatest benefits come when data is both FAIR and Open, supporting the widest possible reuse, and reuse at scale.
How can grantees meet the FAIR Data Principles?
The foundation endorses the FAIR Data Principles as a framework to promote the broadest reuse of research data.
Findable
In order for data to be reused, it must be findable. To ensure that others can find your data, we ask that data be hosted by a stable and recognized open repository (where it is safe to do so) and assigned a globally unique persistent identifier (such as a DOI). Using such a repository and identifier ensures that your dataset continues to be available to both humans and machines in a useable form in the future. To aid discoverability, data should also be described using appropriate metadata. The content and format of metadata is often guided by a specific discipline and/or repository through the use of a metadata standard. When depositing data in a repository, it is important that you fill in as many fields as possible as this information usually contributes to the metadata record(s). In some cases, specifically where using a discipline-specific repository, the submission of metadata files alongside the data may be required.
Grantee Actions
- Make the output accessible via the web
- Obtain a unique, persistent identifier for each data output
- Annotate the output with rich metadata describing its origin, contents and related project output
Accessible
Data accessibility is defined by the presence of a user license. Data supporting Gates funded research should be openly published under the CC0 license which facilitates data reuse. For software and source code, we strongly advise the use an OSI-approved license. We recognized that there are cases where openly sharing data may not be feasible (due to ethical or confidentiality considerations). In these cases, we have policies in place to allow the publication of articles associated with such data, while still maintaining the appropriate level of security.
For practical guidance please see Add a Data Availability statement to your manuscript.
Grantee actions:
- Apply an appropriate open usage license to the data
- Include a data availability statement with your article at submission
Interoperable
Interoperable data can be compared and combined with data from different sources by both humans and machines – promoting integrative analyses. To bolster interoperability, data should be stored in a non-proprietary open file format and described using a standard vocabulary (where available). In some cases, the preferred file formats and vocabularies will be dictated by the repository you choose to host your data.
For practical guidance please see Prepare your data for sharing.
Grantee actions
- Format the output according to an easily processed encoding format
- For data sets, align any terms with domain-relevant community standards and vocabularies
Reusable
Data that is findable, accessible, and interoperable is generally fit for reuse. On occasion, the inclusion of additional documentation alongside the data may be required to ensure that the data are understandable and thus reusable. As a general rule, someone who is not familiar with the data should be able to understand what it is about using only the metadata and documentation provided.
By extension, the same practices that enable data reuse also support reproducibility.
Grantee actions
- Include any additional documentation that enhances the understanding of the data
What resources are available to help make data FAIR?
This list of resources can provide best practices and guidance to support grantees aiming to make data FAIR:
- F1000 Getting Started Guide – Simple steps and best practices to follow to make data FAIR and Open when publishing a research article.
- How to Make Your Research Data FAIR – Explanation of FAIR principles and translated into practical information for researchers.
- Output Management Plan Template – Guidelines on FAIR Data Management and OMP template example
- Metadata Standards Directory – Online catalogue that can be searched for discipline-specific standards and associated tools.
- FAIRSharing.org – A curated and searchable portal of data standards, databases, and policies across many scientific disciplines.