ORFG's Response to OSTP’s “Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research"
Note: This public comment was submitted in conjunction with the OSTP’s request found here.
This response to the White House Office of Science and Technology Policy’s “Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research” is submitted on behalf of the Open Research Funders Group. The Open Research Funders Group (ORFG) is a partnership of 16 philanthropic organizations committed to the open sharing of research outputs. We believe this benefits society by accelerating the pace of discovery, reducing information-sharing gaps, encouraging innovation, and promoting reproducibility. The ORFG engages a range of stakeholders to develop actionable principles and policies that promote greater dissemination, transparency, replicability, and reuse of papers, data, and a range of other research types. Our current roster of member organizations includes the Alfred P. Sloan Foundation, the American Heart Association, the Arcadia Fund, the Bill & Melinda Gates Foundation, the Eric & Wendy Schmidt Fund for Strategic Innovation, the Gordon and Betty Moore Foundation, Howard Hughes Medical Institute, the James S. McDonnell Foundation, the John Templeton Foundation, Arnold Ventures, the Leona M. and Harry B. Helmsley Charitable Trust, the Lumina Foundation, Open Society Foundations, Templeton World Charity Foundation, the Robert Wood Johnson Foundation, and the Wellcome Trust. Collectively, the ORFG members hold assets in excess of $100 billion, with total annual giving in the $10 billion range. Members’ interests range the entirety of the disciplinary spectrum, including life sciences, physical sciences, social sciences, and the humanities. This response has been prepared by Greg Tananbaum, the chief administrator of the Open Research Funders Group, in conjunction with representatives of the ORFG membership.
The Open Research Funders Group is supportive of the White House Office of Science and Technology Policy’s commitment to advance open science and foster implementation of agency Public Access Plans. Identifying best practices for the long-term preservation of data from Federally funded research is a critical component of these efforts. The ORFG is pleased to provide succinct input to the OSTP regarding desirable characteristics of data repositories. These recommendations are drawn from both the direct experience of our members, many of whom have open data policies for the research they fund, and our engagement with the broader scientific community.
Federal grant recipients should, first and foremost, be expected to deposit their data in a data environment that supports the FAIR data sharing principles - findable, accessible, interoperable, and reusable. The FAIR principles are at the core of the open data and reproducibility movement. Any repository housing Federally supported data should clearly and publicly articulate how it conforms to the core components of FAIR:
Findable
- (Meta)data are assigned a globally unique and persistent identifier
- Data are described with rich metadata (defined by R1 below)
- Metadata clearly and explicitly include the identifier of the data they describe
- (Meta)data are registered or indexed in a searchable resource
Accessible
- (Meta)data are retrievable by their identifier using a standardized communications protocol
- The protocol is open, free, and universally implementable
- The protocol allows for an authentication and authorization procedure, where necessary
- Metadata are accessible, even when the data are no longer available
Interoperable
- (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
- (Meta)data use vocabularies that follow FAIR principles
- (Meta)data include qualified references to other (meta)data
Reusable
- (Meta)data are richly described with a plurality of accurate and relevant attributes
- (Meta)data are released with a clear and accessible data usage license
- (Meta)data are associated with detailed provenance
- (Meta)data meet domain-relevant community standards
Given the wide range of projects funded by Federal agencies, no single repository will be universally applicable to house all funded datasets. Instead, the ORFG recommends that Federal agencies should provide grant recipients with a degree of latitude in selecting the most appropriate repository to house their research data. In order for Federally funded research to reach their widest audience and have their deepest impact, these data should be deposited in repositories with clear and explicit guidance along the following dimensions, over and above the FAIR components articulated above:
Re-Use. The repository must allow any interested party to freely access the data without restriction on research reuse, using a CC0 or similar license. This should be codified in the repository’s terms of service.
Security. The repository must describe how datasets are stored and protected from vulnerabilities such as credentials theft or hacking. For any data that require gatekeeping on human subject protection or similar grounds, the repository must describe how this information is accessed and protected.
Stability. The repository must have a clearly articulated funding mechanism or business plan to provide reasonable assurances that the data will be available for the indefinite future. It should also have a continuity plan addressing what will happen to the data in the event the repository is discontinued.
Fee Structure. Any costs associated with data deposit and data maintenance must be clearly articulated. This includes details about whether fees are one-time or recurring, as well as how the size of the dataset may impact the cost. The repository must make these costs structures publicly available without restriction.
Subject Focus. There are hundreds of domain-specific repositories in operation at this writing. In general, grant recipients should be encouraged to deposit their data in a repository that is appropriate for the subject matter in question. Further, if a repository consistent with the considerations articulated in this document has emerged within a specific research community as the default resource in that field (e.g., GenBank for DNA sequences), grant recipients should, as a general rule, be encouraged utilize that repository. This optimizes the ability of others to discover and build upon the data.
Metadata. The repository must require a depositor to provide sufficient metadata provided to enable the dataset to be used by others. These metadata should be searchable so that repository visitors can easily discover appropriate datasets.
File Formats. The repository should be able to accommodate all aspects of the grant recipients’ dataset, regardless of file type and size.
Machine Extraction. The data stored in the repository should be available in a machine-readable and machine-interpretable format, preferably via API (Application Programming Interface). This will encourage text and data mining, meta-analysis, and information extraction, and additional knowledge discovery.
The Open Research Funders Group appreciates the opportunity to comment on this project, and we are eager to assist in its eventual rollout.