Forthcoming NIH Data Management and Sharing Policy

On January 25, 2023, the National Institutes of Health (NIH) will implement a new Data Management and Sharing Policy. The Library Data Services Program and Research Data Management Program have several resources to help you adapt to the new policy, including extensive guidance and suggested language for writing a plan. Additionally, the DMPTool, which is free for UC Berkeley users and supported by the Library Data Services Program, walks grant applicants through the plan requirements. We will be offering four drop-in workshops designed for researchers this fall. Please register using for the workshops using the links below:



The NIH is a leader in implementing data management plans and was the first federal granting agency to do so with their 2003 NIH Sharing Policy. In the years since, the agency developed two genomic data sharing policies (2008 and 2014), and addressed data sharing in clinical trials in 2016. The new data sharing policy builds on their existing data management requirements and is broadly sweeping with the goal to maximize the “…sharing of scientific data generated from NIH-funded or conducted research, with justified limitations or exceptions.” 


The new policy will apply to all research funded (in whole or in part) by the NIH that produces scientific data. It will apply to grant applications submitted on or after January 25, 2023. The NIH defines scientific data as: “the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications…” Please note that the NIH definition of scientific data does NOT include the following: laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.


There are a few aspects that set the new NIH DMSP apart from current policy:

  • Plans will outline how data and metadata will be managed over the course of the project and which of these data will be shared.
  • Grant applicants will need to include details about software or other tools that were used to analyze the data.
  • If generating data derived from human participants, plans will need to outline how confidentiality, privacy, and rights of those individuals will be preserved.
  • Plans must include a selected repository or repositories where the data will be preserved along with a timeline for sharing the data (either as soon as possible, no later than the time of an associated publication, or at the end of performance period if there is no associated publication). 
  • Updates to plans over the course of the project will be reviewed by the NIH ICO (institute, center, or office) during regular reporting intervals.
  • Data management costs may be added to the grant budget including data curation and developing documentation (e.g., formatting data, de-identifying data); data management considerations (e.g., unique and specialized information infrastructure necessary to provide local management and preservation before depositing in a repository); preserving data in data repositories (e.g., data deposit fees).
  • Compliance with plans will be measured during the funding period at regular reporting intervals.


An FAQ page is under development and will be posted here shortly. For additional information about the policy, please see the NIH page on Data Management and Sharing Policy


If you have any questions or need additional support, please email 

Introducing the New Research Data Management (RDM) Program Website

Screenshot of homepage As Service Lead of the Research Data Management (RDM) Program, I am very excited to announce the launch of the new RDM Program website! Since its launch in 2015, the RDM Program supports and advocates for campus researchers and their research data needs. In partnership with other campus research support units, we consult with researchers and research groups to understand their needs and connect them to systems and tools that facilitate their research goals.

A key aspect of building this site was to highlight the areas in which the RDM Program offers support for research data. From the Service Areas page, you can learn more about the six areas in which the RDM Program offers consulting, delivers training, and develops documentation for the UC Berkeley research community. These areas are: data classification & security, data collection, data management, data sharing, data storage & backup, and data transfer.

Additionally, the RDM Program sponsors and contributes to several events throughout the year, most notably Love Data Week and Women in Data Science (WiDS) at Berkeley. See content from previous years on the Events and Training page as well as the YouTube playlist of our recorded trainings!

By moving to the Open Berkeley platform, the RDM Program site now aligns with other campus websites and comes with enhanced security and accessibility features as well as Berkeley branding. Content from the previous RDM Program website has been moved – either to the new site or to the Research IT documentation site – or archived using the WayBack Machine.

Special thanks to: Tiffany Vo for her design and user experience expertise, Amy Neeser for helping develop and archive content, and the rest of the RDM Program team for their support. I hope you find our updated site useful in navigating the Berkeley research landscape and connecting with help! Let me know at if there is anything you want to see featured or if you have questions.


Erin D Foster

Nexis Data Lab Computing Environment


The UC Berkeley Library is pleased to announce access to a new text and data mining platform, Nexis Data Lab from LexisNexis. The cloud-based platform enables users to run computational analysis in a Jupyter notebook on content licensed for use at UC Berkeley. Please take a look at this brief, two minute video to see how the environment works. Researchers should be familiar with Python or R. Each account may have up to 6 projects (workspaces), with a limit of 100,000 documents per project. The number of seats are limited, so we ask that you have a TDM project in progress. 


Please view the list of content and titles available to UC Berkeley users. LexisNexis will continue to make additional content available as the platform grows. (Note that the following publications are NOT available: The New York Times (NDL does include NYT International), The New York Times Blogs, Wall Street Journal Abstracts, Information Base Abstracts, and Jane’s Defence Weekly.)


If you would like to get started using the platform, request a seat by filling out this form. The Library is holding a training session for the platform (hosted by LexisNexis) on August 15, 2022 at 9:00 AM. Please register here for the event. For more information on text and data mining platforms and resources available at UC Berkeley, please check out our guide to Text Mining and Computational Text Analysis. Contact the Library Data Services Program at with questions. 


Library Data Services Program logo

Create Change With Us: By Editing Wikipedia!

Edit For Change Editathon Logo

EVENT: Wikipedia Edit for Change: Workshop + group editing

Wednesday, February 16, 1:00pm-2:30pm


We look forward every year to offering the annual Library Wikipedia Editathon, but this year we’re mixing it up in some exciting new ways!

First, what is an editathon?  It’s familiar to many of us to think about Wikipedia as a crowd-sourced online encyclopedia, which means that it’s only as good as its individual entries.  Library communities in particular are deeply committed to the quality of information in this much-used resource.  So, supporting Berkeley in learning to edit through a group editing event, with workshops for beginners, that is, an editathon, is a natural fit for us at the Library.

Second, what are we doing to mix it up?  Our focus is Edit For Change—we will support you in editing Wikipedia towards any change you’d like to see!   There is always room for improvement in Wikipedia’s topics and content.  We’ll offer an introductory level Wikipedia workshop to support you in editing Wikipedia even if you’ve never edited before, and following the workshop, we’ll make edits together so you can pursue your editing interests more fully.

As an added bonus, we’re also excited to pair with the new initiatives of the recently-launched Library Data Services Program. This year we’re part of the University of California’s Love Data Week event calendar, which is an invigorating connection.  How does Wikipedia connect to data? You may have heard of Wikidata, but there are also other connections between Wikipedia and data science: for example, Wikipedia’s content is a treasure trove for researchers who analyze textual content using data science methods.  Here are some examples of the kind of research that is happening now, and here and here are some suggestions for approaching these methods if you are interested!

Intrigued?  We’d love to have you come participate in the event!  Registration is at . Questions?  Feel free to email us at, and we hope to “see” you on February 16!

“Big Data as a Way of Life”: How the UCB Library Can Support Big Data Research at Berkeley

This post summarizes findings and recommendations from the Library’s Ithaka S+R Local Report, “Supporting Big Data Research at the University of California, Berkeley” released on October 1, 2021.  The research was conducted and the report written by Erin D. Foster, Research Data Management Program Service Lead, Research IT & University of California, Berkeley (UCB) Library, Ann Glusker, Sociology, Demography, Public Policy, & Quantitative Research Librarian, UCB Library, and Brian Quigley, Head of the Engineering & Physical Sciences Division, UCB Library.


In 2020, the Ithaka S+R project “Supporting Big Data Research” brought together twenty-one U.S. institutions to conduct a suite of parallel studies aimed at understanding researcher practices and needs related to data science methodologies and big data research. A team from the UCB Library conducted and analyzed interviews with a group of researchers at UC Berkeley.  The timeline appears below.  The UC Berkeley team’s report outlines the findings from the interviews with UC Berkeley researchers and makes recommendations for potential campus and library opportunities to support big data research.  In addition to the UCB local report, Ithaka S+R will be releasing a capstone report later this year that will synthesize findings from all of the parallel studies to provide an overall perspective on evolving big data research practices and challenges to inform emerging services and support across the country.

Timeline of activities of Berkeley Ithaka report creation (June 2020-October 2021)


After successfully completing human subjects review, and using an interview protocol and training provided by Ithaka S+R, the team members recruited and interviewed 16 researchers from across ranks and disciplines whose research involved big data, defined as data having at least two of the following: volume, variety, and velocity.

Two charts showing the distribution of researchers interviewed, by their rank and their discipline (4 categories each)


After transcribing the interviews and coding them using an open coding process, six themes emerged.  These themes and sub-themes are listed below and treated fully in the final report.  The report includes a number of quotes so that readers can “hear” the voices of Berkeley’s big data researchers most directly.  In addition, the report outlines the challenges reported by researchers within each theme.

List of the six themes developed from the research, and the subthemes associated with each



The most important part of the entire research process was developing a list of recommendations for the UC Berkeley Library and its campus partners. Based on the needs and challenges expressed by researchers, and influenced by our own sense of the campus data landscape including the newly formed Library Data Services Program, these recommendations are discussed in more detail in the full report.  They reflect the two main challenges that interviewees reported Berkeley faces as big data research becomes increasingly common.  One challenge is that the range of discrete data operations happening all over campus, not always broadly promoted, means that it is easy to have duplications of services and resources — and silos. The other (related) challenge is that Berkeley has a distinctive data landscape and a long history of smaller units on campus being at the cutting edge of data activities. How can these be better integrated while maintaining their individuality and freedom of movement?  Additionally, funding is a perennial issue, given the fact that Berkeley is a public institution in an area with a high cost of living and a very competitive salary structure for tech workers who provide data supports.

Here are the report’s recommendations in brief:

  1. Create a research-welcoming “third place” to encourage and support data cultures and communities.

The creation of a “data culture” on campus, which can infuse everything from communications to curricula, can address challenges related to navigating the big data landscape at Berkeley, including collaboration/interdisciplinarity, and the gap between data science and domain knowledge. One way to operationalize this idea is to utilize the concept of the “third place,” first outlined by Ray Oldenburg.  This can happen in, but should not be limited to, the library, and it can occur in both physical and virtual spaces.  Encouraging open exploration and conversation across silos, disciplines, and hierarchies is the goal, and centering Justice, Diversity, Equity and Inclusion (JEDI) as a core principle is essential.

  • The University Library, in partnership with Research IT, conducts continuous inquiry and assessment of researchers and data professionals, to be sure our efforts address the in-the-moment needs of researchers and research teams.
  • The University Library, in line with being a “third place” for conversation and knowledge sharing, and in partnership with a range of campus entities, sponsors programs to encourage cross-disciplinary engagement.
  • Research IT and other campus units institute a process to explore resource sharing possibilities across teams of researchers in order to address duplication and improve efficiency.
  • The University Library partners with the Division of Computing, Data Science, and Society (CDSS) to explore possibilities for data-dedicated physical and virtual spaces to support interdisciplinary data science collaboration and consultation.
  • A consortium of campus entities develops a data policy/mission statement, which has as its central value an explicit justice, equity, diversity and inclusion (JEDI) focus/requirement.
  1. Enhance the campus computing and data storage infrastructure to support the work of big data researchers across all disciplines and funding levels.

Researchers expressed gratitude for campus computing resources but also noted challenges with bandwidth, computing power, access, and cost. Others seemed unaware of the full extent of resources that were available to them. It is important to ensure that our computing and storage options meet researcher needs and then encourage them to leverage those resources.

  • Research, Teaching & Learning and the University Library partner with Information Services & Technology (IST) to conduct further research and benchmarking in order to develop baseline levels of free data storage and computing access for all campus researchers.
  • Research IT and the University Library work with campus to develop further incentives for funded researchers to participate in the Condo Cluster Program for Savio and/or the Secure Research Data & Computing (SRDC) platform.
  • The University Library and Research IT partner to develop and promote streamlined, clear, and cost-effective workflows for storing, sharing, and moving big data.
  1. Strengthen communication of research data and computing services to the campus community.

In the interviews, researchers directly or indirectly expressed a lack of knowledge about campus services, particularly as they related to research data and computing. In light of that, it is important for campus service providers to continuously assess how researchers are made aware of the services available to them.

  • The University Library partners with Research IT to establish a process to reach new faculty across disciplines about campus data and compute resources.
  • The University Library partners with Research IT and CDSS (including D-Lab and BIDS) to develop a promotional campaign and outreach model to increase awareness of the campus computing infrastructure and consulting services.
  • The University Library develops a unified and targeted communication method for providing campus researchers with information about campus data resources – big data and otherwise.
  1. Coordinate and develop training programs to support researchers in “keeping up with keeping up”

One of the most-cited challenges researchers stated in terms of training is that of keeping up with the dizzying pace of advances in the field of big data, which necessitate learning new methods and tools.  Even with postdoc/grad student contributions, it can seem impossible to stay up to date with needed skills and techniques. Accordingly, the focus in this area should be to help researchers to keep up with staying current in their fields.

  • The University Library addresses librarians’/library staff needs for professional development to increase comfort with the concepts of and program implementation around the research life cycle and big data.
  • The University Library’s newly formed Library Data Services Program (LDSP) is well-positioned to offer campus-wide training sessions within the Program’s defined scope, and to serve as a hub for coordination of a holistic and scaffolded campus-wide training program
  • The University Library’s LDSP, departmental liaisons, and other campus entities offering data-related training should specifically target graduate students and postdocs for research support.
  • CDSS and other campus entities investigate the possibility of a certificate training program — targeted at faculty, postdocs, graduate students — leading to knowledge of the foundations of data science and machine learning, and competencies in working with those methodologies.

The full report concludes with a quote from one of the researchers interviewed, which we team members feel encapsulates much of the current situation relating to big data research at Berkeley, as well as the challenges and opportunities ahead:

 [Physical sciences & engineering researcher] “The tsunami is coming. I sound like a crazy person heaping warning, but that’s the future. I’m sure we’ll adapt as this technology becomes more refined, cheaper… Big data is the way of the future. The question is, where in that spectrum do we as folks at Berkeley want to be? Do we want to be where the consumers are or do we want to be where the researchers should be? Which is basically several steps ahead of where what is more or less the gold standard. That’s a good question to contemplate in all of these discussions. 

Do we want to be able to meet the bare minimum complying with big data capabilities? Or do we want to make sure that big data is not an issue? Because the thing is that it’s thrown around in the context that big data is a problem, a buzzword. But how do we at Berkeley make that a non-buzzword? 

Big data should be just a way of life. How do we get to that point?