Ever since the Office of Science and Technology introduced a policy addressing the public’s access to data, federal granting agencies, non-profit granting agencies (like the Gates Foundation), publishers, universities, and researchers have been adjusting to reflect changes in access to data at the national level. The policy requires federal agencies with over $100 million in annual research and development expenses to make research results public and provide a plan for doing so.
As a researcher, this is a difficult landscape to navigate for a number of reasons:
- you may have entered into a research project mid-grant and are unaware of the data management plan that was included in the grant proposal
- the data management plan that was included in the grant application is not being followed
- you’re not sure how funder mandates line up with publisher requirements
- the language that publishers include about data sharing or publishing aren’t straight forward
- you know that you’re supposed to make your data public, but you don’t know where to do this or how to do this
There are a number of other obstacles that make data publishing difficult, but for today, let’s take a look at the data sharing policies of three publishers in the Engineering and Physical Sciences. Publishers will often use suggestive or idealistic language, but does that mean you’re off the hook for sharing? If your publisher requires that you make your data public, how do you comply with your funder data mandate and your publisher data policy?
Elsevier is a massive publisher that currently publishes over 49,000 journals in Health, Life Sciences, Physical Sciences and Engineering, and Social Sciences and Humanities. They also publish books, major reference works, and somewhat recently, acquired Mendeley, citation management software. Their most recent product, Mendeley Data, is a cloud-based repository for datasets. To sum it up – Elsevier is huge. They’ve divided their research data policy into two parts – Principles (the expectations, “shoulds,” and “needs” underpinning their research data policy) and Policy (what they actually do). Elsevier’s principles are idealist and sound great and their policies are suggestive.
For example, one of Elsevier’s Data Sharing Principles:
“Research data should be made available free of charge to all researchers wherever possible and with minimal reuse restrictions.”
“We will encourage and support researchers and research institutions to share data where appropriate and at the earliest opportunity.”
In their Research data FAQ section they answer the question:
“Is it compulsory to share my research data?”
They’ve taken an interesting approach that sets up researchers to share their data (if prepared to do so), without being prescriptive. Elsevier makes it easy to link to datasets in other repositories, and has even started their own repository with Mendeley Data (that’s another blog post for another day). Elsevier has also jumped into the data journal game, with their open access Data in Brief publication. Data publications are emerging as a way for researchers to write an additional article that provides an in-depth description of datasets behind research. This article format provides data, which is typically buried in supplementary material, another avenue for discovery.
Imagine what could happen to the world of data sharing if a research giant like Elsevier made their policies less like principles and required research data sharing instead of suggesting it.
Springer Nature, formerly known as Springer and the Nature Publishing Group, announced a merger in January of 2015. The new publishing giant produces about 13% of the papers in the scholarly publishing market, still behind Elsevier (23%) (scholarly kitchen). About a year after the merger, the new publisher developed an approach to research data policies that would allow them to remain flexible across their wide range of journals.
Four different policy types:
- data sharing and data citation is encouraged
- data sharing and evidence of data sharing encouraged
- data sharing encouraged and statements of data availability required
- data sharing, evidence of data sharing and peer review of data required
The Springer Nature approach allows for flexibility and takes into account the current practices of each discipline the publisher supports. However, prior to submission, you need to know which policy your Springer Nature journal follows (yet another argument for following good data management practices from the start). Let’s take a closer look at each policy.
- Research Data Policy Type 1 is the most lenient by encouraging data citation and sharing. I like to think of policy 1 as “data sharing lite,” because Springer Nature provides you with information about how to share and cite data, but you don’t necessarily have to. A few titles that fit into this category are: Academic Questions, Accreditation and Quality Assurance, Aesthetic Plastic Surgery, Contemporary Islam, and Journal of Happiness Studies.
- Research Data Policy Type 2 requires the authors to be more open with their relevant raw data by implying that the data will be available to any researcher who would like to reuse them for non-commercial purposes (barring confidentiality issues). This policy falls somewhere between “optional” and “mandatory.” The publisher is telling its journal policy 2 readers that this data is freely available for them to reuse, therefore warning, or preparing, the authors that they may be asked for their data. The easiest way to handle requests like this is to make is publicly available, with a citation and assigned digital object identifier in a repository. A few examples of type 2 journals include: Agronomy for Sustainable Development, BioEnergy Research, Brain Imaging and Behavior, and Journal of Geovisualization and Spatial Analysis
- Research Data Policy Type 3 is geared specifically for journals that publish research on the life sciences. When an author submits to policy 3 journals, they are strongly encouraged to deposit data in repositories. It is implied that all raw data is freely available (again, barring confidentiality issues) to any researcher who requests it. For policies 1 and 2, authors may deposit data in general repositories. However, for policy 3, researchers must deposit specific types of data in a list of prescribed repositories. For example, DNA and RNA sequencing data must be deposited in the NCBI Trace Archive or the NCBI Sequence Read Archive (SRA). A few examples of type 3 journals include: Journal of Hematology and Oncology, Nature Cell Biology, and Nature Chemistry.
- Research Data Policy Type 4 requires that all of the datasets for the paper’s conclusion must be available to reviewers and readers. The datasets have to be available in repositories prior to the peer review process (or be made available in supplementary material) and is conditional upon publication that data is in the appropriate repository. Examples of type 4 journals include BMC Biology, Genome Biology, and Retrovirology.
AAAS, the American Association for the Advancement of Science is much smaller in scope than Springer Nature and Elsevier. AAAS is both a professional society and reputable publisher of six journals: Science; Science Translational Medicine; Science Signaling; Science Advances; Science Immunology, and Science Robotics. Unlike the other two publishers, AAAS can set tight and strict policies surrounding research data because they publish a small percentage of what the other two produce. Datasets must be deposited in approved repositories with an accession number prior to publication. AAAS encourages compliance with MIBBI (Minimum Information for Biological and Biomedical Investigations) guidelines. AAAS provides a list of approved repositories based on data type (similar to Spring Nature type 4). Not only does AAAS stipulate that data must be available, but that all materials that are necessary to understand and assess the research must be made available. This includes code, patents, and even fossils or rare specimens. Please see AAAS’s publication policies for more information.
These publishers are ordered on a scale from “suggestive” and “encouraging” data policies to strict mandates for sharing research materials (AAAS). Ultimately, you should prepare your data and supporting research materials, like code, from the beginning of a research project as if you were going to publish in a AAAS journal. There are more reasons to that than following publisher data sharing mandates, which I’ll explore in future posts.