University of California Research Data Policy: a few things to know

University of California Research Data Policy: a few things to know

The University of California Office of the President recently announced an updated Research Data Policy, effective July 15, 2022. The new policy complements the original policy from 1958. It re-confirms that research data are owned by the University but outlines how University Researchers may use the data generated or collected in the course of their research. While most researchers likely will find that the updated policy doesn’t require a complete overhaul of their data stewardship practices, it’s important to understand key  terms, conditions, and permissions enabled by the new policy. The policy, however, will help them make decisions around management, retention, data publication, and data transfer. Implementation of this policy at a campus level is currently under development. Additional details are forthcoming.

A few key points: 

  • The Regents of the University of California own Research Data generated or collected in the course of University Research. 
    • Research Data include “recorded information embodying facts resulting from a scientific inquiry.” Research Data do not include scholarly & aesthetic works, informal notes, paper drafts, administrative or medical records, and other materials (see policy text for complete list).
    • University Research means “research conducted by a Principal Investigator or University Researcher that is within the course and scope of their assigned duties, uses University resources, and/or is funded by or through the University.”
  • University Researchers may use the Research Data they generate or collect in order to conduct other research, share with collaborators, publish outcomes, and create scholarly works. The University “supports the free and unfettered dissemination of information, knowledge, and discoveries generated by University Researchers.” As such:
    • Principal Investigators (PIs) are the stewards of Research Data, and maintain autonomy about which data should be preserved or dispositioned;
    • Researchers may share data as dictated by scholarly/disciplinary standards or data management plans, or legal, funder, or contractual requirements; 
    • When a University Researcher leaves the UC, they may take copies of the data they generated or collected, as long as it is approved by the PI;
    • Neither the University nor University Researchers may assert ownership of Research Data owned by third parties.

 

Resources and Assistance: 

 

Written by Tim Vollmer, Erin Foster, and Anna Sackmann

 


NIH DMSP Frequently Asked Questions

The Library Data Services Program recently posted about the National Institutes of Health (NIH) Data Management and Sharing Policy and how it will affect UC Berkeley researchers. Please read more about the new policy on this post.

Here are a list of FAQs about the new policy. Please contact the UC Berkeley Library Data Services Program (librarydataservices@berkeley.edu) with questions.

How is UC Berkeley  responding to this policy?

The Library Data Services Program is collaborating with the Research Data Management Program to provide guidance and documentation to ensure compliance with the NIH policy.

What is considered “scientific data” for the purposes of this plan?

The final NIH Policy defines Scientific Data as: “The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.” The NIH states that “the final DMS Policy is designed to increase the sharing of scientific data, regardless of whether a publication is produced…Data that do not form the basis of a publication produced during the award period should be shared by the end of the award period.”

What is included in a Data Management and Sharing Plan?

In these max two-page documents, researchers will describe their:

  • Data type(s)
  • Related tools, software, and/or code
  • Standards
  • Data preservation, access, and associated timelines
  • Access, distribution, or reuse considerations
  • Oversight of data management and sharing

Read more about Data Management Plans and see sample language

Can I make my data available upon request?

NIH strongly prefers that scientific data be shared and preserved through repositories or, for datasets up to 2GB, through PubMed Central-deposited supplemental data files, rather than kept by a researcher and provided upon request.

How will the plans be assessed?

NIH program staff will assess the DMS plans but peer reviewers may comment on the proposed budget for data management and sharing.

What data repository should I use?

NIH encourages the use of established repositories. To select the best repository for your data consider the following:

  • Is there a specific NIH repository named in the funding announcement?
  • Is there a data repository specific to your discipline?
  • If not, is there a general data repository you can use?

To learn more, read the NIH guidance on selecting a data repository.

What is a standard? What standards are relevant to my research?

A standard specifies how exactly data and related materials should be stored, organized, and described. In the context of research data, the term typically refers to the use of specific and well-defined formats, schemas, vocabularies, and ontologies in the description and organization of data. However, for researchers within a community where more formal standards have not been well established, it can also be interpreted more broadly to refer to the adoption of the same (or similar) data management-related activities, conventions, or strategies by different researchers and across different projects.

When do I need to make my data available?

NIH encourages scientific data to be shared as soon as possible, and no later than time of an associated publication or end of the performance period, whichever comes first.

What data management and sharing costs can I include in my grant?

Allowable costs can include:

  • data curation and developing documentation (formatting data, de-identifying data, preparing metadata, curating data for a data repository)
  • data management considerations (unique and specialized information infrastructure necessary to provide local management and preservation before depositing in a repository)
  • preserving data in data repositories (data deposit fees)

Read more about allowable costs.

What happens if I do not comply with the NIH policy or make my data available as described in the DMS policy?

NIH Program Staff will be monitoring compliance with the policy during the funding period. “Noncompliance with Plans may result in the NIH ICO adding special Terms and Conditions of Award or terminating the award. If award recipients are not compliant with Plans at the end of the award, noncompliance may be factored into future funding decisions.”

I work with sensitive topics/populations – how do I protect my participants’ privacy?

NIH strongly encourages researchers who work with sensitive topics and/or populations to address data sharing in the Informed Consent process. See the UC Berkeley Human Research Protection Program’s Informed Consent page, which includes guidelines and appropriate form templates.

Researchers should pay special attention to their de-identification process to ensure that all identifying information has been fully removed. Researchers should consider depositing their data in restricted access repositories that require data use agreements and research plans in order to access the data. Contact librarydataservices@berkeley.edu if you would like guidance on selecting restricted access repositories.

Please view the UCSF’s resources on data de-identification and sharing de-identified data for additional guidance. 

 

Supplemental information from the NIH:

Responsible Management and Sharing of American Indian/Alaska Native Participant Data

Protecting Privacy When Sharing Human Research Participant Data

 

Many thanks to Ariel Deardorff at the UCSF Library for allowing us to adapt their list of Frequently Asked Questions and thank you to UC Berkeley’s Elliott Smith, Michael Sholinbeck, and Erin Foster for all of their expertise and contributions. 

 


Forthcoming NIH Data Management and Sharing Policy

On January 25, 2023, the National Institutes of Health (NIH) will implement a new Data Management and Sharing Policy. The Library Data Services Program and Research Data Management Program have several resources to help you adapt to the new policy, including extensive guidance and suggested language for writing a plan. Additionally, the DMPTool, which is free for UC Berkeley users and supported by the Library Data Services Program, walks grant applicants through the plan requirements. We will be offering four drop-in workshops designed for researchers this fall. Please register using for the workshops using the links below:

 

 

The NIH is a leader in implementing data management plans and was the first federal granting agency to do so with their 2003 NIH Sharing Policy. In the years since, the agency developed two genomic data sharing policies (2008 and 2014), and addressed data sharing in clinical trials in 2016. The new data sharing policy builds on their existing data management requirements and is broadly sweeping with the goal to maximize the “…sharing of scientific data generated from NIH-funded or conducted research, with justified limitations or exceptions.” 

 

The new policy will apply to all research funded (in whole or in part) by the NIH that produces scientific data. It will apply to grant applications submitted on or after January 25, 2023. The NIH defines scientific data as: “the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications…” Please note that the NIH definition of scientific data does NOT include the following: laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.

 

There are a few aspects that set the new NIH DMSP apart from current policy:

  • Plans will outline how data and metadata will be managed over the course of the project and which of these data will be shared.
  • Grant applicants will need to include details about software or other tools that were used to analyze the data.
  • If generating data derived from human participants, plans will need to outline how confidentiality, privacy, and rights of those individuals will be preserved.
  • Plans must include a selected repository or repositories where the data will be preserved along with a timeline for sharing the data (either as soon as possible, no later than the time of an associated publication, or at the end of performance period if there is no associated publication). 
  • Updates to plans over the course of the project will be reviewed by the NIH ICO (institute, center, or office) during regular reporting intervals.
  • Data management costs may be added to the grant budget including data curation and developing documentation (e.g., formatting data, de-identifying data); data management considerations (e.g., unique and specialized information infrastructure necessary to provide local management and preservation before depositing in a repository); preserving data in data repositories (e.g., data deposit fees).
  • Compliance with plans will be measured during the funding period at regular reporting intervals.

 

Please see our list of frequently asked questions. For additional information about the policy, please see the NIH page on Data Management and Sharing Policy

Many thanks to Elliott Smith, Michael Sholinbeck, and Erin Foster for all of their expertise and contributions!

If you have any questions or need additional support, please email librarydataservices@berkeley.edu. 


Nexis Data Lab Computing Environment

 

The UC Berkeley Library is pleased to announce access to a new text and data mining platform, Nexis Data Lab from LexisNexis. The cloud-based platform enables users to run computational analysis in a Jupyter notebook on content licensed for use at UC Berkeley. Please take a look at this brief, two minute video to see how the environment works. Researchers should be familiar with Python or R. Each account may have up to 6 projects (workspaces), with a limit of 100,000 documents per project. The number of seats are limited, so we ask that you have a TDM project in progress. 

 

Please view the list of content and titles available to UC Berkeley users. LexisNexis will continue to make additional content available as the platform grows. (Note that the following publications are NOT available: The New York Times (NDL does include NYT International), The New York Times Blogs, Wall Street Journal Abstracts, Information Base Abstracts, and Jane’s Defence Weekly.)

 

If you would like to get started using the platform, request a seat by filling out this form. The Library is holding a training session for the platform (hosted by LexisNexis) on August 15, 2022 at 9:00 AM. Please register here for the event. For more information on text and data mining platforms and resources available at UC Berkeley, please check out our guide to Text Mining and Computational Text Analysis. Contact the Library Data Services Program at librarydataservices@berkeley.edu with questions. 

 

Library Data Services Program logo


Publisher Data Requirements Revisited

In May and September of 2017, the Library wrote posts (read them here and here) about a number of publisher research data policies. Over the last year, publishers have engaged in conversations with institutions, funders, and not-for-profit organizations to examine how they can better shape and influence the sharing of research data.

Image from Unsplash by Franki Chamaki

To accompany their data sharing policies and recommendations, publishers like Springer Nature and Elsevier recently developed their own research data services to better assist researchers who are preparing their data to be published alongside a manuscript. They now provide individual guidance (for a fee) and repositories in which to deposit and share data. Please talk to a consultant at UC Berkeley’s Research Data Management program about the guidance we can provide along with University of California supported data sharing options.

Elsevier continues to communicate about research data through a series of principles (data should be made available free of charge wherever possible with minimal reuse restrictions; by enabling effective reuse of data we’re finding efficiencies and preventing duplication of effort). These principles map to a series of policies. The policies speak to how Elsevier will support and encourage researchers when sharing data. Elsevier’s research data guidelines, which remain largely unchanged since last year, prescribe how and when researchers will share their data. Elsevier’s journals are assigned to one of five research data guidelines, which have slight variations in language and range from “encouraged to deposit research data” to “required to deposit research data.”

When submitting to an Elsevier journal, be sure to check the individual journal’s Guide for Authors, which is located on the journal homepage. Elsevier does not maintain a master list of journals mapped to the five research data guidelines. Your subject librarian can provide guidance if you need more information about the data publishing policies from a specific Elsevier title. If you don’t know where you will submit your research, it’s best to prepare for the most rigorous data policy by adhering to a data management plan throughout the course of your work.

Springer Nature’s data publishing policies follows the same, four tiered structure they developed in 2017; however, they’ve added more nuanced requirements within each tier for the life sciences and non life sciences. Check here to see the publisher’s list of journals and their assigned data publishing policy.

Wiley applies one of three data sharing policies to their journals: encourages data sharing; expects data sharing; and mandates data sharing. The publisher has created an author compliance tool, which enables researchers who are submitting papers to one of the publisher’s journals to check what they need to do with their data to be in compliance with their funder, institution, and journal. For example, if your research is funded by the NIH, you work at a University of California institution, and would like to publish in Bioengineering and Translational Medicine, you’ll learn that the journal encourages you to share your data, the NIH requires you to share your data, and the university does not have a policy. In cases like this, you need to default to the entity that requires the most sharing. In this case, you would share your data as stipulated by the NIH.

Wiley’s author compliance tool points out the gaps in policy that exist for researchers, especially in the United States. Data sharing policies differ widely between institutions, publishers, and funders which leads to confusion for the researcher. In general, when planning research and communicating your results, take the Open Science approach, which advocates for showing your work and sharing your work in the name of advancing science. By thoroughly documenting your data and research process, others are better able to understand your work and potentially utilize the data for another research purpose. The Open Science approach supports transparency and reuse, which results in better science and more rapid advances. If you would like more information about preparing your data to be shared with others, please contact the Research Data Management Program.

 


NEW DMPTool Launched!

A shiny new version of the DMPTool was launched at the end of February. The big change, beyond the new color scheme and layout, is that it is now a single source platform for all DMPs. It now incorporates the codebase from other instances of  the program from all over the world, including: DMPTuuli (Finland), DMP Melbourne (Australia), DMP Assistant (Canada), DMPOnline (Europe), and many more! The move was made to combine all platforms into one in order to focus on best practices at an international level. Please learn more about the new instance by visiting the DMPTool Blog.

DMPTool Logo


Love Data Week 2018!

Description of event
The University Library,  Research IT,  and Berkeley Institute for Data Science invite faculty, students, and staff to a series of events on February 12th-16th during Love Data Week 2018.  Love Data Week is a nationwide campaign designed to raise awareness about data visualization, management, sharing, and preservation.
Please join us to learn about multiple data services that the campus provides and discover options for managing and publishing your data. Graduate students, researchers, librarians and data specialists are invited to attend these events to gain hands on experience, learn about resources, and engage in discussion around researchers’ data needs at different stages of their research process.
To register for these events and find out more, please visit: http://guides.lib.berkeley.edu/ldw2018guide
Schedule:
 
Intro to Scopus APIs –  Learn about different types of APIs Scopus has to offer and how to get data from APIs. In the first hour, learn about the portal, what the API can do, and about different use cases. Following a short break, the instructor will take the group through live queries, show how to test code, provide tips and tricks, and will leave the group with sample code to work with. Attendees will be able to follow up with the instructor via webinar to troubleshoot and ask further questions about specific projects. Register from here.
01:00 – 03:00 p.m.Tuesday, February 13, Doe Library, Room 190 (BIDS)
Refreshments will be provided.
Data stories and Visualization Panel – Learn how data is being used in creative and compelling ways to tell stories. Researchers across disciplines will talk about their successes and failures in dealing with data.
1:00 – 02:45 p.m.Wednesday, February 14, Doe Library, Room 190 (BIDS)
Refreshments will be provided.
Planning for & Publishing your Research Data – Learn why and how to manage and publish your research data as well as how to prepare a data management plan for your research project.
01:00 – 02:00 p.m.Thursday, February 15, Doe Library, Room 190 (BIDS)
We hope to see you there!

Alice Fan, MD: New Cancer Therapies & Women in Science

The last Nanoscale Science and Engineering (NSE) seminar of the semester is scheduled for Friday, December 1st from 2:00 – 3:00 in 180 Tan Hall. Alice Fan, from the Stanford Medical School, will be speaking on new nanoimmunoassays that enable the isolation and analysis of tumor cells. Following her talk, the Graduate Women in Engineering (GradSWE) will host a coffee hour from 3:30-4:30 in 242 Sutardja Dai Hall.

Alice Fan photo