Anna Sackmann

NEW DMPTool Launched!

Posted on March 9, 2018March 15, 2018 by Anna Sackmann

A shiny new version of the DMPTool was launched at the end of February. The big change, beyond the new color scheme and layout, is that it is now a single source platform for all DMPs. It now incorporates the codebase from other instances of the program from all over the world, including: DMPTuuli (Finland), DMP Melbourne (Australia), DMP Assistant (Canada), DMPOnline (Europe), and many more! The move was made to combine all platforms into one in order to focus on best practices at an international level. Please learn more about the new instance by visiting the DMPTool Blog.

DMPTool Logo

Love Data Week 2018!

Posted on February 7, 2018February 13, 2018 by Anna Sackmann

The University Library, Research IT, and Berkeley Institute for Data Science invite faculty, students, and staff to a series of events on February 12th-16th during Love Data Week 2018. Love Data Week is a nationwide campaign designed to raise awareness about data visualization, management, sharing, and preservation.

Please join us to learn about multiple data services that the campus provides and discover options for managing and publishing your data. Graduate students, researchers, librarians and data specialists are invited to attend these events to gain hands on experience, learn about resources, and engage in discussion around researchers’ data needs at different stages of their research process.

To register for these events and find out more, please visit: http://guides.lib.berkeley.edu/ldw2018guide

Schedule:

Intro to Scopus APIs – Learn about different types of APIs Scopus has to offer and how to get data from APIs. In the first hour, learn about the portal, what the API can do, and about different use cases. Following a short break, the instructor will take the group through live queries, show how to test code, provide tips and tricks, and will leave the group with sample code to work with. Attendees will be able to follow up with the instructor via webinar to troubleshoot and ask further questions about specific projects. Register from here.

01:00 – 03:00 p.m., Tuesday, February 13, Doe Library, Room 190 (BIDS)

Refreshments will be provided.

Data stories and Visualization Panel – Learn how data is being used in creative and compelling ways to tell stories. Researchers across disciplines will talk about their successes and failures in dealing with data.

1:00 – 02:45 p.m., Wednesday, February 14, Doe Library, Room 190 (BIDS)

Refreshments will be provided.

Planning for & Publishing your Research Data – Learn why and how to manage and publish your research data as well as how to prepare a data management plan for your research project.

01:00 – 02:00 p.m., Thursday, February 15, Doe Library, Room 190 (BIDS)

We hope to see you there!

Alice Fan, MD: New Cancer Therapies & Women in Science

Posted on November 30, 2017 by Anna Sackmann

The last Nanoscale Science and Engineering (NSE) seminar of the semester is scheduled for Friday, December 1st from 2:00 – 3:00 in 180 Tan Hall. Alice Fan, from the Stanford Medical School, will be speaking on new nanoimmunoassays that enable the isolation and analysis of tumor cells. Following her talk, the Graduate Women in Engineering (GradSWE) will host a coffee hour from 3:30-4:30 in 242 Sutardja Dai Hall.

Alice Fan photo

Engineering Academic Challenge!

Posted on September 22, 2017 by Anna Sackmann

The Elsevier Engineering Academic Challenge is back! The team-based challenge lasts for five weeks and began on September 18. Register here to get started and win prizes!

quote about EAC

UPDATE: Elsevier Data Publishing Requirements

Posted on September 19, 2017September 19, 2017 by Anna Sackmann

Last spring, we posted about data publishing requirements from Elsevier, Springer/Nature, and AAAS. At the time, Elsevier was the most lenient on their data publishing policies and used language that was suggestive and encouraging of data publishing. As of September 5th, 2017, that is no longer the case. Elsevier has signed on to the Transparency and Openness Guidelines (TOP) through the Center for Open Science. We talk and write a lot about transparency, openness, and sharing in science; however, there is a disconnect between the conversations and the daily workflows and practice of scientists. I was once told, after giving a workshop on data sharing, that I was an idealist trying to preach to realists. In order to close that gap, we need more publishers, like Elsevier, to make the ideal a reality, and enforce strict guidelines on data sharing and publishing.

Elsevier Logo

Let’s take a look at the 5 new data sharing requirements, which will be implemented for 1800 of Elsevier’s titles:

Option A: you are encouraged to

deposit your research data in a relevant repository
cite this dataset in your article

Option B: you are encouraged to

deposit your research data in a relevant repository
cite this dataset in your article
link this dataset in your article
If you can’t do this, be prepared to explain why!

Option C: you are required to

deposit your research data in a relevant repository
cite this dataset in your article
link this dataset in your article
if you can’t do this, be prepared to explain why!

Option D: you are required to

deposit your research data in a relevant repository
cite this dataset in your article
link this dataset in your article

Option E: you are required to

deposit your research data in a relevant repository
cite this dataset in your article
link this dataset in your article
peer reviewers will review the data prior to publication

The new Elsevier policy is similar in nature to Springer/Nature with their tiered system of requirements. It’s important to check with your individual journal to see which option it falls under. Ideally, you will always follow option E, where you make your data openly available, cited, linked, and provide the proper amount of metadata to go through the peer review process or be reused by another researcher.

If you have any questions about how to enrich the metadata of your dataset, or where to deposit your research data, please email researchdata@berkeley.edu!

Maps & More: Hamilton!

Posted on September 11, 2017September 13, 2017 by Anna Sackmann

Please mark your calendars for the first Maps and More pop-up exhibit of the semester!

Hamilton, in Maps

Friday, September 22, 11 am – noon

Earth Sciences & Map Library, 50 McCone Hall

You’ve listened to the musical, now put some names to places with maps related to Alexander Hamilton’s life and exploits. This month’s Maps and More collections show-and-tell event is offered in coordination with the On the Same Page program. Featuring maps and atlases from the Earth Sciences & Map Library collection, this exhibit helps put some geographic context to key events in Hamilton, from his birth in the West Indies to his years in Philadelphia and New York and his deadly duel on the banks of the Hudson.

We’re delighted to have history graduate student Nicole Viglini guest curating this pop-up exhibit. Nicole’s research interests include themes of race, culture, class, and gender in early and nineteenth-century America and the Caribbean.

We hope to see you there!

Susan Powell

Sam Teplitzky

ps: Save the date for next month’s Maps and More on Wednesday, Oct. 18, 11 am – noon with mapmaker Stace Wright of Eureka Cartography!

Data Practices and Publishing Workshop Series

Posted on August 30, 2017 by Anna Sackmann

On Tuesday, September 5th and Tuesday, September 12th, the Kresge Engineering Library and Research Data Management will be holding a series of two data management workshops designed for researchers who are in the midst navigating the research data lifecycle.

research data lifecycle

https://www.jisc.ac.uk/guides/research-data-management

During the first workshop, Efficient Research Data Practices, we’ll tear apart the above cycle and identify where each attendee currently falls in the data lifecycle. We’ll address pitfalls, tips, and tools for each step of the process that includes creating data management plans; setting up secure storage for the active data management phase; and how to prepare your data for publication while adhering to publisher and funder requirements.

The second workshop, Data Sharing: Publishing and Archiving, will take a deep dive into metadata creation and preparing data for publication and archiving. We’ll discuss why data publication is so important and we’ll identify individual publisher requirements for datasets. Daniella Lowenberg, formerly a publication manager for PLoS, and now the Research Data Specialist for the California Digital Library will be joining us.

Please register for the workshops by clicking on the below links and we look forward to seeing you!

Efficient Research Data Practices: September 5, 4:00 – 5:00, Kresge Engineering Library – 110MD Bechtel Engineering Center

Data Sharing: Publishing and Archiving: September 12, 4:00 – 5:00, Kresge Engineering Library – 110MD Bechtel Engineering Center

Overleaf and ShareLaTeX – Joining Forces!

Posted on August 9, 2017 by Anna Sackmann

Overleaf and ShareLaTeX, online collaborative LaTeX editors, will soon be merging into one platform, utilizing their individual strengths. Both tools emerged on the market around the same time in 2012, seeing incredible growth and promise from users as longterm, useful tools. In January 2017, the UC-Berkeley Library subscribed to both tools in order to provide our researchers and students with pro account features of both tools. Both tools enable users to collaborate with groups and individuals on documents; simplify file directories; provide real-time previews; quickly identify errors; and provide access to excellent training tools and hundreds of templates from publishers and different types of documents, not just articles. If you regularly write documents in LaTeX, consider integrating one of these tools into your workflow. Both of them integrate with citation management software, git or GitHub, and provide revision history. Overleaf and ShareLaTeX contribute to a research workflow environment of transparency and preservation, both of which lend well to sharing and revisiting data and notation by others or your future self.

Individually, ShareLaTeX and Overleaf have focused on developing different strengths.

Overleaf:

WYSIWYG editor
publisher relationships for streamlined submission process
integration with Mendeley (which we also have an institutional subscription to!)

ShareLaTeX

track changes feature
robust real-time collaborative editing environment
syncing to Github (ask about UC-Berkeley’s instance!)

The merger of the two platforms will focus on bringing together the strongest components of each tool. For now, you can continue to create accounts on either platform and continue with your work. The founders of ShareLaTeX and Overleaf would like input from their users through this survey.

In the meantime, please join us at the Kresge Engineering Library to learn more about LaTeX and how to write in ShareLaTeX and Overleaf. We will be holding three workshops at the beginning of fall semester, in the Kresge Engineering Library Training Room:

August 24th, 4:00 – 5:00: Introduction to LaTeX

August 31st, 4:00 – 5:00: Typesetting in Math

September 7th, 4:00 – 5:00: Creating Tables, Figures, and Bibliographies

Please register through this form.

Please let us know if you have any questions about the Overleaf and ShareLaTeX merger, or the upcoming workshops.

GitHub: Archiving and Repositories

Posted on June 6, 2017 by Anna Sackmann

Github has become ubiquitous in the coding world and, with the advent of data science and computation in a slew of other disciplines, researchers are turning to the version control repository and hosting service. Google uses it, Microsoft uses it, and it’s on the list of the top 100 most popular sites on Earth. As a librarian and a member of the Research Data Management team, I often get the question: “Can I archive my code in my Github repository?” From the research data management perspective, the answer is a little sticky.

github mark

The terms “archive” and “repository” from GitHub mean something very different than their definitions from a research data management perspective. For example, in GitHub, a repository “contains all of the project files…and stores each file’s revision history.” Archiving content on GitHub means that your repository will stay on GiHub until you choose to remove it (or if GitHub receives a DMCA takedown notice, or if it violates their guidelines or terms of service).

For librarians, research data managers, and many funders and publishers, archiving content in a repository requires more stringent requirements. For example, Dryad, a commonly known repository, requires those who wish to remove content to go through a lengthy process proving that work has been infringed, or is not in compliance of the law (read more about removing content from Dryad here). Most importantly, Dryad (and many other repositories) take specific steps to preserve the research materials. For example:
* persistent identification
* fixity checks
* versioning
* multiple copies are kept in a variety of storage sites

A good repository provides persistent access to materials, enables discovery, and does not guarantee, but takes multiple steps to prevent data loss.

So, how can you continue to work efficiently through GitHub and adhere to good archival practices? GitHub links up with Zenodo, a repository based out of CERN. Data files are stored at CERN with another site in Budapest. All data is backed-up on a daily basis with regular fixity and authenticity checks. Zenodo assigns a digital object identifier to your code, making it persistently identifiable and discoverable. Check out this guide on Making Your Code Citable for more information on linking your GitHub with Zenodo. Zenodo isn’t perfect and there are a few limitations, including a max file size of 50 GB. Read more about their policies here.

UC-Berkeley has its own institutional version of GitHub, which means that Berkeley development teams and individual contributors can now have private repositories (and private, shared repositories within the Berkeley domain). If you’d like access, please email github@berkeley.edu. Additionally, we have institutional subscriptions to Overleaf and ShareLaTeX, both of which integrate with GitHub.

Please contact researchdata@berkeley.edu if you’d like more information about archiving your code on GitHub.

Elsevier, Springer Nature, and AAAS: Publisher Research Data Policies

Posted on May 4, 2017May 5, 2017 by Anna Sackmann

Ever since the Office of Science and Technology introduced a policy addressing the public’s access to data, federal granting agencies, non-profit granting agencies (like the Gates Foundation), publishers, universities, and researchers have been adjusting to reflect changes in access to data at the national level. The policy requires federal agencies with over $100 million in annual research and development expenses to make research results public and provide a plan for doing so.

As a researcher, this is a difficult landscape to navigate for a number of reasons:

you may have entered into a research project mid-grant and are unaware of the data management plan that was included in the grant proposal
the data management plan that was included in the grant application is not being followed
you’re not sure how funder mandates line up with publisher requirements
the language that publishers include about data sharing or publishing aren’t straight forward
you know that you’re supposed to make your data public, but you don’t know where to do this or how to do this

There are a number of other obstacles that make data publishing difficult, but for today, let’s take a look at the data sharing policies of three publishers in the Engineering and Physical Sciences. Publishers will often use suggestive or idealistic language, but does that mean you’re off the hook for sharing? If your publisher requires that you make your data public, how do you comply with your funder data mandate and your publisher data policy?

Elsevier is a massive publisher that currently publishes over 49,000 journals in Health, Life Sciences, Physical Sciences and Engineering, and Social Sciences and Humanities. They also publish books, major reference works, and somewhat recently, acquired Mendeley, citation management software. Their most recent product, Mendeley Data, is a cloud-based repository for datasets. To sum it up – Elsevier is huge. They’ve divided their research data policy into two parts – Principles (the expectations, “shoulds,” and “needs” underpinning their research data policy) and Policy (what they actually do). Elsevier’s principles are idealist and sound great and their policies are suggestive.

For example, one of Elsevier’s Data Sharing Principles:

“Research data should be made available free of charge to all researchers wherever possible and with minimal reuse restrictions.”

Policy:

“We will encourage and support researchers and research institutions to share data where appropriate and at the earliest opportunity.”

In their Research data FAQ section they answer the question:

“Is it compulsory to share my research data?”

A: No.

They’ve taken an interesting approach that sets up researchers to share their data (if prepared to do so), without being prescriptive. Elsevier makes it easy to link to datasets in other repositories, and has even started their own repository with Mendeley Data (that’s another blog post for another day). Elsevier has also jumped into the data journal game, with their open access Data in Brief publication. Data publications are emerging as a way for researchers to write an additional article that provides an in-depth description of datasets behind research. This article format provides data, which is typically buried in supplementary material, another avenue for discovery.

Imagine what could happen to the world of data sharing if a research giant like Elsevier made their policies less like principles and required research data sharing instead of suggesting it.

Springer Nature, formerly known as Springer and the Nature Publishing Group, announced a merger in January of 2015. The new publishing giant produces about 13% of the papers in the scholarly publishing market, still behind Elsevier (23%) (scholarly kitchen). About a year after the merger, the new publisher developed an approach to research data policies that would allow them to remain flexible across their wide range of journals.

Four different policy types:

data sharing and data citation is encouraged
data sharing and evidence of data sharing encouraged
data sharing encouraged and statements of data availability required
data sharing, evidence of data sharing and peer review of data required

The Springer Nature approach allows for flexibility and takes into account the current practices of each discipline the publisher supports. However, prior to submission, you need to know which policy your Springer Nature journal follows (yet another argument for following good data management practices from the start). Let’s take a closer look at each policy.

Research Data Policy Type 1 is the most lenient by encouraging data citation and sharing. I like to think of policy 1 as “data sharing lite,” because Springer Nature provides you with information about how to share and cite data, but you don’t necessarily have to. A few titles that fit into this category are: Academic Questions, Accreditation and Quality Assurance, Aesthetic Plastic Surgery, Contemporary Islam, and Journal of Happiness Studies.
Research Data Policy Type 2 requires the authors to be more open with their relevant raw data by implying that the data will be available to any researcher who would like to reuse them for non-commercial purposes (barring confidentiality issues). This policy falls somewhere between “optional” and “mandatory.” The publisher is telling its journal policy 2 readers that this data is freely available for them to reuse, therefore warning, or preparing, the authors that they may be asked for their data. The easiest way to handle requests like this is to make is publicly available, with a citation and assigned digital object identifier in a repository. A few examples of type 2 journals include: Agronomy for Sustainable Development, BioEnergy Research, Brain Imaging and Behavior, and Journal of Geovisualization and Spatial Analysis
Research Data Policy Type 3 is geared specifically for journals that publish research on the life sciences. When an author submits to policy 3 journals, they are strongly encouraged to deposit data in repositories. It is implied that all raw data is freely available (again, barring confidentiality issues) to any researcher who requests it. For policies 1 and 2, authors may deposit data in general repositories. However, for policy 3, researchers must deposit specific types of data in a list of prescribed repositories. For example, DNA and RNA sequencing data must be deposited in the NCBI Trace Archive or the NCBI Sequence Read Archive (SRA). A few examples of type 3 journals include: Journal of Hematology and Oncology, Nature Cell Biology, and Nature Chemistry.
Research Data Policy Type 4 requires that all of the datasets for the paper’s conclusion must be available to reviewers and readers. The datasets have to be available in repositories prior to the peer review process (or be made available in supplementary material) and is conditional upon publication that data is in the appropriate repository. Examples of type 4 journals include BMC Biology, Genome Biology, and Retrovirology.

AAAS, the American Association for the Advancement of Science is much smaller in scope than Springer Nature and Elsevier. AAAS is both a professional society and reputable publisher of six journals: Science; Science Translational Medicine; Science Signaling; Science Advances; Science Immunology, and Science Robotics. Unlike the other two publishers, AAAS can set tight and strict policies surrounding research data because they publish a small percentage of what the other two produce. Datasets must be deposited in approved repositories with an accession number prior to publication. AAAS encourages compliance with MIBBI (Minimum Information for Biological and Biomedical Investigations) guidelines. AAAS provides a list of approved repositories based on data type (similar to Spring Nature type 4). Not only does AAAS stipulate that data must be available, but that all materials that are necessary to understand and assess the research must be made available. This includes code, patents, and even fossils or rare specimens. Please see AAAS’s publication policies for more information.

These publishers are ordered on a scale from “suggestive” and “encouraging” data policies to strict mandates for sharing research materials (AAAS). Ultimately, you should prepare your data and supporting research materials, like code, from the beginning of a research project as if you were going to publish in a AAAS journal. There are more reasons to that than following publisher data sharing mandates, which I’ll explore in future posts.