Science & Engineering Libraries

ASHRAE Standards and Guidelines: now online!

Posted on July 10, 2017 by Lisa Ngo

UC Berkeley researchers now have full online access to standards issued by the American Society of Heating, Refrigerating and Air Conditioning Engineers (ASHRAE). ASHRAE Standards and Guidelines are widely used by researchers and professionals in the design and maintenance of indoor environments and those interested in refrigeration processes. Access is provided through the Techstreet Enterprise platform and requires the proxy or VPN from off-campus.

In addition to the Standards and Guidelines, ASHRAE also publishes a series of Transactions and Handbooks. Interested in other ASHRAE publications? Check OskiCat for access or contact a librarian for help!

GitHub: Archiving and Repositories

Posted on June 6, 2017 by Anna Sackmann

Github has become ubiquitous in the coding world and, with the advent of data science and computation in a slew of other disciplines, researchers are turning to the version control repository and hosting service. Google uses it, Microsoft uses it, and it’s on the list of the top 100 most popular sites on Earth. As a librarian and a member of the Research Data Management team, I often get the question: “Can I archive my code in my Github repository?” From the research data management perspective, the answer is a little sticky.

github mark

The terms “archive” and “repository” from GitHub mean something very different than their definitions from a research data management perspective. For example, in GitHub, a repository “contains all of the project files…and stores each file’s revision history.” Archiving content on GitHub means that your repository will stay on GiHub until you choose to remove it (or if GitHub receives a DMCA takedown notice, or if it violates their guidelines or terms of service).

For librarians, research data managers, and many funders and publishers, archiving content in a repository requires more stringent requirements. For example, Dryad, a commonly known repository, requires those who wish to remove content to go through a lengthy process proving that work has been infringed, or is not in compliance of the law (read more about removing content from Dryad here). Most importantly, Dryad (and many other repositories) take specific steps to preserve the research materials. For example:
* persistent identification
* fixity checks
* versioning
* multiple copies are kept in a variety of storage sites

A good repository provides persistent access to materials, enables discovery, and does not guarantee, but takes multiple steps to prevent data loss.

So, how can you continue to work efficiently through GitHub and adhere to good archival practices? GitHub links up with Zenodo, a repository based out of CERN. Data files are stored at CERN with another site in Budapest. All data is backed-up on a daily basis with regular fixity and authenticity checks. Zenodo assigns a digital object identifier to your code, making it persistently identifiable and discoverable. Check out this guide on Making Your Code Citable for more information on linking your GitHub with Zenodo. Zenodo isn’t perfect and there are a few limitations, including a max file size of 50 GB. Read more about their policies here.

UC-Berkeley has its own institutional version of GitHub, which means that Berkeley development teams and individual contributors can now have private repositories (and private, shared repositories within the Berkeley domain). If you’d like access, please email github@berkeley.edu. Additionally, we have institutional subscriptions to Overleaf and ShareLaTeX, both of which integrate with GitHub.

Please contact researchdata@berkeley.edu if you’d like more information about archiving your code on GitHub.

Elsevier, Springer Nature, and AAAS: Publisher Research Data Policies

Posted on May 4, 2017May 5, 2017 by Anna Sackmann

Ever since the Office of Science and Technology introduced a policy addressing the public’s access to data, federal granting agencies, non-profit granting agencies (like the Gates Foundation), publishers, universities, and researchers have been adjusting to reflect changes in access to data at the national level. The policy requires federal agencies with over $100 million in annual research and development expenses to make research results public and provide a plan for doing so.

As a researcher, this is a difficult landscape to navigate for a number of reasons:

you may have entered into a research project mid-grant and are unaware of the data management plan that was included in the grant proposal
the data management plan that was included in the grant application is not being followed
you’re not sure how funder mandates line up with publisher requirements
the language that publishers include about data sharing or publishing aren’t straight forward
you know that you’re supposed to make your data public, but you don’t know where to do this or how to do this

There are a number of other obstacles that make data publishing difficult, but for today, let’s take a look at the data sharing policies of three publishers in the Engineering and Physical Sciences. Publishers will often use suggestive or idealistic language, but does that mean you’re off the hook for sharing? If your publisher requires that you make your data public, how do you comply with your funder data mandate and your publisher data policy?

Elsevier is a massive publisher that currently publishes over 49,000 journals in Health, Life Sciences, Physical Sciences and Engineering, and Social Sciences and Humanities. They also publish books, major reference works, and somewhat recently, acquired Mendeley, citation management software. Their most recent product, Mendeley Data, is a cloud-based repository for datasets. To sum it up – Elsevier is huge. They’ve divided their research data policy into two parts – Principles (the expectations, “shoulds,” and “needs” underpinning their research data policy) and Policy (what they actually do). Elsevier’s principles are idealist and sound great and their policies are suggestive.

For example, one of Elsevier’s Data Sharing Principles:

“Research data should be made available free of charge to all researchers wherever possible and with minimal reuse restrictions.”

Policy:

“We will encourage and support researchers and research institutions to share data where appropriate and at the earliest opportunity.”

In their Research data FAQ section they answer the question:

“Is it compulsory to share my research data?”

A: No.

They’ve taken an interesting approach that sets up researchers to share their data (if prepared to do so), without being prescriptive. Elsevier makes it easy to link to datasets in other repositories, and has even started their own repository with Mendeley Data (that’s another blog post for another day). Elsevier has also jumped into the data journal game, with their open access Data in Brief publication. Data publications are emerging as a way for researchers to write an additional article that provides an in-depth description of datasets behind research. This article format provides data, which is typically buried in supplementary material, another avenue for discovery.

Imagine what could happen to the world of data sharing if a research giant like Elsevier made their policies less like principles and required research data sharing instead of suggesting it.

Springer Nature, formerly known as Springer and the Nature Publishing Group, announced a merger in January of 2015. The new publishing giant produces about 13% of the papers in the scholarly publishing market, still behind Elsevier (23%) (scholarly kitchen). About a year after the merger, the new publisher developed an approach to research data policies that would allow them to remain flexible across their wide range of journals.

Four different policy types:

data sharing and data citation is encouraged
data sharing and evidence of data sharing encouraged
data sharing encouraged and statements of data availability required
data sharing, evidence of data sharing and peer review of data required

The Springer Nature approach allows for flexibility and takes into account the current practices of each discipline the publisher supports. However, prior to submission, you need to know which policy your Springer Nature journal follows (yet another argument for following good data management practices from the start). Let’s take a closer look at each policy.

Research Data Policy Type 1 is the most lenient by encouraging data citation and sharing. I like to think of policy 1 as “data sharing lite,” because Springer Nature provides you with information about how to share and cite data, but you don’t necessarily have to. A few titles that fit into this category are: Academic Questions, Accreditation and Quality Assurance, Aesthetic Plastic Surgery, Contemporary Islam, and Journal of Happiness Studies.
Research Data Policy Type 2 requires the authors to be more open with their relevant raw data by implying that the data will be available to any researcher who would like to reuse them for non-commercial purposes (barring confidentiality issues). This policy falls somewhere between “optional” and “mandatory.” The publisher is telling its journal policy 2 readers that this data is freely available for them to reuse, therefore warning, or preparing, the authors that they may be asked for their data. The easiest way to handle requests like this is to make is publicly available, with a citation and assigned digital object identifier in a repository. A few examples of type 2 journals include: Agronomy for Sustainable Development, BioEnergy Research, Brain Imaging and Behavior, and Journal of Geovisualization and Spatial Analysis
Research Data Policy Type 3 is geared specifically for journals that publish research on the life sciences. When an author submits to policy 3 journals, they are strongly encouraged to deposit data in repositories. It is implied that all raw data is freely available (again, barring confidentiality issues) to any researcher who requests it. For policies 1 and 2, authors may deposit data in general repositories. However, for policy 3, researchers must deposit specific types of data in a list of prescribed repositories. For example, DNA and RNA sequencing data must be deposited in the NCBI Trace Archive or the NCBI Sequence Read Archive (SRA). A few examples of type 3 journals include: Journal of Hematology and Oncology, Nature Cell Biology, and Nature Chemistry.
Research Data Policy Type 4 requires that all of the datasets for the paper’s conclusion must be available to reviewers and readers. The datasets have to be available in repositories prior to the peer review process (or be made available in supplementary material) and is conditional upon publication that data is in the appropriate repository. Examples of type 4 journals include BMC Biology, Genome Biology, and Retrovirology.

AAAS, the American Association for the Advancement of Science is much smaller in scope than Springer Nature and Elsevier. AAAS is both a professional society and reputable publisher of six journals: Science; Science Translational Medicine; Science Signaling; Science Advances; Science Immunology, and Science Robotics. Unlike the other two publishers, AAAS can set tight and strict policies surrounding research data because they publish a small percentage of what the other two produce. Datasets must be deposited in approved repositories with an accession number prior to publication. AAAS encourages compliance with MIBBI (Minimum Information for Biological and Biomedical Investigations) guidelines. AAAS provides a list of approved repositories based on data type (similar to Spring Nature type 4). Not only does AAAS stipulate that data must be available, but that all materials that are necessary to understand and assess the research must be made available. This includes code, patents, and even fossils or rare specimens. Please see AAAS’s publication policies for more information.

These publishers are ordered on a scale from “suggestive” and “encouraging” data policies to strict mandates for sharing research materials (AAAS). Ultimately, you should prepare your data and supporting research materials, like code, from the beginning of a research project as if you were going to publish in a AAAS journal. There are more reasons to that than following publisher data sharing mandates, which I’ll explore in future posts.

Virtual Reality for Cal Day

Posted on April 20, 2017 by Anna Sackmann

The Kresge Engineering Library will be one of the host sites for VR @ Berkeley, a student group that brings virtual reality to the campus community. By working with industry and UC-Berkeley researchers, VR @ Berkeley makes virtual reality an accessible experience. Each year, members of the group focus on a wide range of projects that bend the intersection between our physical realities and the virtual. Their work spans many applications including: changing the way we read and interact with textbooks, allowing medical workers in the field communicate with doctors in a more intuitive manner, and a virtual experience of our iconic, 61 bell Campanile.

Virtual Reality at Berkeley Landships

During Cal Day, the Kresge Engineering Library will be hosting Project Landships, a multiplayer tank combat simulator. Players can work together as a crew to aim, shoot, drive, and spot. The experience emulates a WWII Sherman Firefly Tank.

Check out other VR @ Berkeley Projects on Cal Day at the following locations:
1. Kresge Engineering Library
2. ESS Patio
3. Jacobs Hall
4. Sproul Plaza
5. The House (Bancroft)
6. Moffitt Library

Advanced PubMed workshop

Posted on April 4, 2017 by Elliott Smith

PubMed logo

Want to make your searches for biomedical information more effective and efficient? The Library’s Life and Health Sciences Division is holding a hands-on workshop on advanced features of PubMed, including:

How to use filters to focus search results on specific article types, publication dates and more
How to add field tags to find articles by author, title, journal, and other criteria
How Medical Subject Headings (MeSH) can help you find additional relevant information
How to use My NCBI to save searches, set up alerts, and display results in your preferred format
How PubMed links to information in other NCBI resources

Location: Bioscience Library Training Room, 2101 VLSB
Date: Wednesday April 12
Time: 12 – 1 pm

No pre-registration is required; all are welcome.

Questions? Please contact Elliott Smith at esmith@library.berkeley.edu

EndNote (X8): Workshop

Posted on March 29, 2017March 29, 2017 by Elliott Smith

EndNote (X8): Citation & Document Manager: Hands-on Workshop

Use EndNote to manage your documents, organize your citations, import from databases, add pdfs, insert footnotes into your Word docs, and format bibliographies in any style (for Windows and Mac).

This hands-on workshop will cover getting started, adding citations, get full text, insert footnotes and create bibliographies.
The Bioscience Library Training room is equipped with PCs and EndNote X8, you are welcome to bring your laptop.

Open to all interested students and researchers; no registration is required.
Add this workshop to my bCal
Questions? Contact skoskine@library.berkeley.edu

Date: Tuesday, April 4.
Time: 4 – 5pm
Location: Bioscience Library Training room, 2189 VLSB (inside the library).

Write. Cite. Repeat.

Posted on March 15, 2017 by UC Berkeley Library

Research management tools

Looking for an easy way to manage your research? The Library has you covered. We now offer premium access to three products — Overleaf, Mendeley, & ShareLaTeX — that make collaborative writing and citing in the engineering and physical sciences much easier. Sign up and learn more.

Overleaf is an online collaborative LaTeX editor with integrated real-time preview. It offers templates for arXiv and many journal publishers to help get you started, and it can also be linked to other services such as Mendeley, Git, and Plot.ly. A pro account (avaialable for free when you sign up with your Berkeley email) will provide up to 10GB storage space, 500 files per project, full project history, and the ability to save to Dropbox.
ShareLaTeX is also an online collaborative LaTeX editor. It too offers templates for arXiv and many journal publishers. With a premium account, you will get unlimited collaborators, full project history, and the ability to sync with Dropbox and Github.
Mendeley is a reference manager and academic social network that allows you to organize your references across multiple devices, automatically generate bibliographies, and share references with collaborators online. Your institutional account will provide up to 5GB personal library space, 20GB shared library space, 25 collaborators in private groups, and unlimited private groups.

CRCnetBASE: Online Science Books

Posted on March 15, 2017March 21, 2017 by Brian Quigley

Have you ever wished you could look up something in a scientific book when you are studying at home? If so, CRCnetBASE is the answer!

This online collection of books includes the following topics:

chemistry
engineering
environmental science
food science
math
neuroscience
statistics
and more!

You can search across all books, browse books by subject, and download the pdfs of chapters. All the books can be found searching OskiCat as well.

Research Tools Fair, Feb 17

Posted on February 11, 2017February 13, 2017 by Brian Quigley

Want to learn about tools to help you be more effective with research, writing, and citation management in the sciences? Join us for our first ever Research Tools Fair on Friday, February 17!

The Fair will consist of brief product demos in the morning followed by drop-in Q&A with vendors in the afternoon. The Fair is open to all but geared toward faculty and students in the physical sciences & engineering. Please drop by for any part of the day that interests you. Coffee & soft drinks will be provided.

Logos of the 5 research tools

Research Tools Fair
Date: February 17, 2017
Location: Engineering Library Training Room (110MD Bechtel)

Schedule:

10:00-10:30: ShareLaTeX (via web conference)
10:30-11:00: Mendeley
11:00-11:30: Overleaf
11:30-12:00: AccessEngineering & DataVis Material Properties
12:00-12:30: Geofacets
1:30-3:00: Drop-in Q&A with AccessEngineering, Engineering Village, Geofacets, Knovel, Mendeley, and Overleaf

Research Data Publishing & Licensing 101

Posted on February 9, 2017March 6, 2017 by Rachael Samberg

Please join Science Data & Engineering Librarian Anna Sackmann and Scholarly Communication Officer Rachael Samberg for practical tips about why, where, and how to publish and license your research data.

This workshop will be held from 11 a.m.–12 p.m., Doe Library, Rm. 190 (BIDS) on February 16, 2017 as part of Love Your Data Week. Check out the reservation form!

Why Should We Care About Publishing Research Data?

Sharing research data promotes transparency, reproducibility, and progress. Indeed, it can spur new discoveries on a daily basis. It’s not atypical for geneticists, for example, to sequence by day and post research results the same evening—allowing others to begin using their datasets in nearly real time (see, for example, Pisani & AbouZahr’s paper about this data publishing cycle). The datasets researchers share can, in turn, inform business or regulatory policymaking, legislation, government or social services, and much more.

Publishing your research data can also increase the impact of your research, and with it, your scholarly profile. Depositing datasets in a repository makes them both visible and citable. You can include them in your CV and grant application biosketches. Conversely, scholars around the world can begin working with your data and crediting you. As a result, sharing detailed research data can be associated with increased citation rates (check out this Piwowar et al. study, among others).

Publishing your data may also be required. Federal funders (e.g. National Institutes of Health), grant agencies (e.g. Bill & Melinda Gates Foundation), and journal publishers (e.g. PLoS and other journals listed in this Open Access Directory) increasingly require that datasets be made publicly available to readers—often immediately upon associated article publication.

How Do We Publish Data?

Merely uploading your dataset to a personal or departmental website won’t achieve these aims of promoting knowledge and progress. Datasets should be able to link seamlessly to any research articles they support. Their metadata should be compatible with bibliographic management and citation systems (e.g. CrossRef or Ref Works), and be formatted for crawling by abstracting and indexing services. After all, you want to be able to find other people’s datasets, manage them in a your own reference manager, and cite them as appropriate. So, you’d want your own dataset to be positioned for the same discoverability and ease of use.

How can you achieve all this? It sounds daunting, but it’s actually pretty straightforward and simple. You’ll just want to select a data publishing tool or service that is built around both preservation and discoverability: It should offer you a stable location or DOI (which will provide a persistent link to your data’s location), help you create sufficient metadata to facilitate transparency and reproducibility, and optimize the metadata for search engines.

For instance, UC’s Dash tool is a terrific and easy-to-use solution that preserves and publishes your datasets. At the Feb. 16 workshop we’re hosting, you can learn more about how to prepare, describe, and upload your data for deposit and publishing with Dash and other tools.

We also recommend that, if your chosen publishing tool enables it, you should include your ORCID (a persistent digital identifier) with your datasets just like with all your other research. This way, your research and scholarly output will be collocated in one place, and it will become easier for others to discover and credit your work.

What Does it Mean to License Your Data For Reuse?

Uploading a dataset—with good metadata, of course!—to a repository is not the end of the road for shepherding one’s research. We must also consider what we are permitting other researchers to do with our data. And, what rights do we, ourselves, have to grant such permissions—particularly if we got the data from someone else, or the datasets were licensed to us for a particular use?

To better understand these issues, we first have to distinguish between attribution and licensing. Citing datasets is an essential scholarly practice. But the issue of someone citing your data is separate from the question of whether it’s permissible for them to use the data in the first place. That is, what license for reuse have you applied to the dataset?

The type of reuse we can grant depends on whether we own our research data and hold copyright in it. There can be a number of possibilities here. For instance, sometimes the terms of contracts we’ve entered into (e.g. funder/grant agreements, website terms of use, etc.) dictate data ownership and copyright. Sometimes, our employers own our research data under our employment contracts (e.g. the research data is “work-for-hire”). In some cases, the dataset might not be copyrightable to begin with if it does not constitute original expression. We could run into hot water if we try to grant licenses to data for which we don’t actually hold rights. For an excellent summary addressing these “Who owns your data?” questions, including copyright issues, check out this blog post by Katie Fortney written for the UC system-wide Office of Scholarly Communication.

To try to streamline ownership and copyright questions, and promote data reuse, often data repositories will simply apply a particular “Creative Commons” license or public domain designation to all deposited datasets. For instance:

Dryad and BioMed Central repositories apply a Creative Commons Zero (CC0) designation to deposited data—meaning that, by depositing in those repositories, you are not reserving any copyright that you might have. Someone using your dataset still should cite the dataset to comply with scholarly norms, but you cannot mandate that they attribute you and cannot pursue copyright claims against them.
UC Dash applies a Creative Commons Attribution (CC-BY) license to datasets deposited by UC researchers. This means that someone using your Dash-deposited dataset not only should cite it to adhere to scholarly norms, but also is required to attribute you as the author.

What’s the Right License or Designation for Your Data?

Well, sometimes you don’t have a say in the matter, as your funding agreement or the repository you choose dictates the license applied. Otherwise, it’s worth considering what your goals are for sharing the data to begin with, and selecting a designation or license that both meets your needs and fits within whatever ownership and use rights you have over the data. Your Scholarly Communication Officer or librarian can help you with this.

Bear in mind that ambiguity surrounding the ability to reuse data inhibits the pace of research. So, try to identify clearly for potential users what rights are being granted in the dataset you publish.

How To Learn More if You’re a UC Berkeley Researcher

Come to the workshop, of course! For data publishing questions, contact the Research Data Management team at researchdata@berkeley.edu. With questions about data ownership, copyright, or licensing, contact the Library’s Scholarly Communication Officer at rsamberg@berkeley.edu. You can also check out the Research Data Management website for more on preserving and disseminating your data. In the meantime, we hope to see you at the workshop next week!

by Rachael Samberg in Scholarly Communications on February 9th, 2017