Love Data Week 2018 at UC Berkeley

The University Library,  Research IT,  and Berkeley Institute for Data Science will host a series of events on February 12th-16th during the Love Data Week 2018. Love Data Week a nationwide campaign designed to raise awareness about data visualization, management, sharing, and preservation.
Please join us to learn about multiple data services that the campus provides and discover options for managing and publishing your data. Graduate students, researchers, librarians and data specialists are invited to attend these events to gain hands-on experience, learn about resources, and engage in discussion around researchers’ data needs at different stages in their research process.
To register for these events and find out more, please visit: http://guides.lib.berkeley.edu/ldw2018guide

Schedule:
Intro to Scopus APIs –  Learn about working with APIs and how to use the Scopus APIs for text mining.
01:00 – 03:00 p.m., Tuesday, February 13, Doe Library, Room 190 (BIDS)
Refreshments will be provided.

Data stories and Visualization Panel – Learn how data is being used in creative and compelling ways to tell stories. Researchers across disciplines will talk about their successes and failures in dealing with data.
1:00 – 02:45 p.m., Wednesday, February 14, Doe Library, Room 190 (BIDS)
Refreshments will be provided.

Planning for & Publishing your Research Data – Learn why and how to manage and publish your research data as well as how to prepare a data management plan for your research project.
02:00 – 03:00 p.m., Thursday, February 15, Doe Library, Room 190 (BIDS)

Hope to see you there!

–Yasmin

Great talks and fun at csv,conf,v3 and Carpentry Training

On May 2 – 5 2017, I (Yasmin AlNoamany) was thrilled to attend the csv,conf,v3 2017 conference and the Software/Data Carpentry instructor training in Portland, Oregon, USA. It was a unique experience to attend and speak with many people who are passionate about data and open science.

The csv,conf,v3

The csv,conf is for data makers from academia, industry, journalism, government, and open source. We had amazing four keynotes by Mike Rostock, the creator of the D3.js (JavaScript library for visualization data), Angela Bassa, the Director of Data Science at iRobot, Heather Joseph, the Executive Director of SPARC, and Laurie Allen, the lead Digital Scholarship Group at the University of Pennsylvania Libraries. The conference had four parallel sessions and a series of workshops about data. Check out the full schedule from here.

I presented on the second day about generating stories from archived data, which entitled “Using Web Archives to Enrich the Live Web Experience Through Storytelling”. Check out the slides of my talk below.

 

I demonstrated the steps of the proposed framework, the Dark and Stormy Archives (DSA), in which, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. For more information about this work, check out this post.

The csv,conf deserved to won the conference of the year prize for bringing the CommaLlama. The Alpaca brought much joy and happiness to all conference attendees. It was fascinating to be in csv,conf 2017 to meet and hear from passionate people from everywhere about data.

After the conference, Max Odgen from the Dat Data Project gave us a great tour from the conference venue to South Portland. We had a great food from street food trucks at Portland, then we had a great time with adorable neighborhood cats!

Tha Carpentry Training

After the csv,conf, I spent two days with other 30 librarians and researchers from different backgrounds to learn how to instruct Data Carpentry, Software Carpentry, and Library Carpentry.  There were three CLIR fellows, John Borghi, Veronica Ikeshoji-Orlati, and myself, attended the training. Completing this training prepares attendees to teach Data Carpentry, Software Carpentry, and Library Carpentry lessons. The Carpentry training is a global movement for teaching scientists in different disciplines the computing skills they need to empower data-driven research and encourage open science.

The two days had a mix of lectures and hands-on exercises about learning philosophy and Carpentry teaching practices. It was a unique and fascinating experience to have. We had two energetic instructors, Tim Dennis and Belinda Weaver, who generated welcoming and collaborate environment for us. Check out the full schedule and lessons from here.

At the end, I would like to acknowledge the support I had from the California Digital Library and the committee of the csv,conf for giving me this amazing opportunity to attend and speak at the csv,conf and the Carpentry instructor training. I am looking forward to applying what I learned in upcoming Carpentry workshops at UC Berkeley.

–Yasmin


Great talks and tips in Love Your Data Week 2017

This week, the University Library and the Research Data Management program were delighted to participate in the Love Your Data (LYD) Week campaign by hosting a series of workshops designed to help researchers, data specialists, and librarians to better address and plan for research data needs. The workshops covered issues related to managing, securing, publishing, and licensing data. Participants from many campus groups (e.g., LBNL, CSS-IT) were eager to continue the stimulating conversation around data management. Check out the full program and information about the presented topics.

Photographs by Yasmin AlNoamany for the University Library.

The Securing Research Data Panel.

The first day of LYD week at UC Berkeley was kicked off by a discussion panel about Securing Research Data, featuring
Jon Stiles (D-Lab, Federal Statistical RDC), Jesse Rothstein (Public Policy and Economics, IRLE), Carl Mason (Demography). The discussion centered upon the rewards and challenges of supporting groundbreaking research when the underlying research data is sensitive or restricted. In a lively debate, various social science researchers detailed their experiences working with sensitive research data and highlighted what has worked and what has proved difficult.

Chris Hoffman illustrated Securing Research Data – A campus-wide project.

At the end, Chris Hoffman, the Program Director of the Research Data Management program, described a campus-wide project about Securing Research Data. Hoffman said the goals of the project are to improve guidance for researchers, benchmark other institutions’ services, and assess the demand and make recommendations to campus. Hoffman asked the attendees for their input about the services that the campus provides.

The attendees of Securing Research Data workshop ask questions about data protection.
Rick Jaffe and Anna Sackmann in the RDM Tools and Tips: Box and Drive workshop.

On the second day, we hosted a workshop about the best practices for using Box and bDrive to manage documents, files and other digital assets by Rick Jaffe (Research IT) and Anna Sackmann (UC Berkeley Library). The workshop covered multiple issues about using Box and bDrive such as the key characteristics, and personal and collaborative use features and tools (including control permissions, special purpose accounts, pushing and retrieving files, and more). The workshop also covered the difference between the commercial and campus (enterprise) versions of Box and Drive. Check out the RDM Tools and Tips: Box and Drive presentation.

Anna and Rick ask attendees to do a group activity to get them talking about their workflow.

We closed out LYD Week 2017 at UC Berkeley with a workshop about Research Data Publishing and Licensing 101. In the workshop, Anna Sackmann and Rachael Samberg (UC Berkeley’s Scholarly Communication Officer) shared practical tips about why, where, and how to publish and license your research data. (You can also read Samberg & Sackmann’s related blog post about research data publishing and licensing.)

Anna Sackmann talks about publishing research data at UC Berkeley.

In the first part of the workshop, Anna Sackmann talked about reasons to publish and share research data on both practical and theoretical levels. She discussed relevant data repositories that UC Berkeley and other entities offer, and provided criteria for selecting a repository. Check out Anna Sackmann’s presentation about Data Publishing.

Anna Sackmann differentiates between different repositories in UC Berkeley.
Rachael Samberg, UC Berkeley’s Scholarly Communication Officer.

During the second part of the presentation, Rachael Samberg illustrated the importance of licensing data for reuse and how the agreements researchers enter into and copyright affects licensing rights and choices. She also distinguished between data attribution and licensing. Samberg mentioned that data licensing helps resolve ambiguity about permissions to use data sets and incentivizes others to reuse and cite data. At the end, Samberg explained how people can license their data and advised UC Berkeley workshop participants to contact her with any questions about data licensing.

Rachael Samberg explains the difference between attribution and license.
Rachael Samberg explains the difference between attribution and license.

Check out the slides from Rachael Samberg’s presentation about data licensing below.

 

The workshops received positive feedback from the attendees. Attendees also expressed their interest in having similar workshops to understand the broader perspectives and skills needed to help researchers manage their data.


Yasmin AlNoamany

Special thanks to Rachael Samberg for editing this post.


Survey about “Understanding researcher needs and values about software”

Software is as important as data when it comes to building upon existing scholarship. However, while there has been a small amount of research into how researchers find, adopt, and credit software, there is currently a lack of empirical data on how researchers use, share, and value software and computer code.

The UC Berkeley Library and the California Digital Library are investigating researchers perceptions, values, and behaviors around the software generated as part of the research process. If you are a researcher, we would appreciate if you could help us understand your current practices related to software and code by spending 10-15 minutes to complete our survey. We are aiming to collect responses from researchers across different disciplines. The answers of the survey will be collected anonymously.

Results from this survey will be used in the development of services to encourage and support the sharing of research software and to ensure the integrity and reproducibility of scholarly activity.

Take the survey now:
https://berkeley.qualtrics.com/jfe/form/SV_aXc6OrbCpg26wo5

The survey will be open until March 20th. If you have any question about the study or a problem accessing the survey, please contact yasminal@berkeley.edu or John.Borghi@ucop.edu.


Yasmin AlNoamany


Could I re-research my first research?

Can I re-research my first scientific paper?
Can I re-research my first scientific paper?

Last week, one of my teammates, at Old Dominion University, contacted me and asked if she could apply some of the techniques I adopted in the first paper I published during my Ph.D. She asked about the data and any scripts I had used to pre-process the data and implement the analysis. I directed her to where the data was saved along with a detailed explanation of the structure of the directories. It took me awhile to remember where I had saved the data and the scripts I had written for the analysis. At the time, I did not know about data management and the best practices to document my research.

Adopted from http://www.slideshare.net/carolegoble/open-sciencemcrgoble2015
Reproducibility in scientific research. Source: http://www.slideshare.net/carolegoble/open-sciencemcrgoble2015

I shared the scripts I generated for pre-processing the data with my colleague, but the information I gave her did not cover all the details regarding my workflow. There were many steps I had done manually for producing the input and the output to and from the pre-processing scripts. Luckily I had generated a separate document that had the steps of the experiments I conducted to generate the graphs and tables in the paper. The document contained details of the research process in the paper along with a clear explanation for the input and the output of each step. When we submit a scientific paper, we get reviews back after a couple of months. That was why I documented everything I had done, so that I could easily regenerate any aspect of my paper if I needed to make any future updates.

The basic entities of scientific research
The basic entities of scientific research.

Documenting the workflow and the data of my research paper during the active phase of the research saved me the trouble of trying to remember all the steps I had taken if I needed to make future updates to my research paper. Now my colleague has all the entities of my first research paper: the dataset, the output paper of my research, the scripts that generated this output, and the workflow of the research process (i.e., the steps that were required to produce this output). She can now repeat the pre-processing for the data using my code in a few minutes.

Human_ MachineFunding agencies have data management planning and data sharing mandates. Although this is important to scientific endeavors and research transparency, following good practices in managing research data and documenting the workflow of the research process is just as important. Reproducing the research is not only about storing data. It is also about the best practices to organize this data and document the experimental steps so that the data can be easily re-used and the research can be reproduced. Documenting the directory structure of the data in a file and attaching this file to the experiment directory would have saved me a lot of time. Furthermore, having a clear guidance for the workflow and documentation on how the code was built and run is an important step to making the research reproducible.

Source: https://sourcemaking.com/antipatterns/reinvent-the-wheel
Source: https://sourcemaking.com/antipatterns/reinvent-the-wheel

While I was working on my paper, I adopted multiple well known techniques and algorithms for pre-processing the data. Unfortunately, I could not find any source codes that implemented them so I had to write new scripts for old techniques and algorithms. To advance the scientific research, researchers should be able to efficiently build upon past research and it should not be difficult for them to apply the basic tenets of scientific methods. My teammate is not supposed to re-implement the algorithms and the techniques I adopted in my research paper. It is time to change the culture of scientific computing to sustain and ensure the integrity of reproducibility.

 

Yasmin AlNoamany


Research Advisory Service

Undergrads, get help with that research project from experts. Make an appointment for a 30-minute session with our library research specialists. We can help you narrow your topic, find scholarly sources, and manage your citations among other things. Make your appointment online!

Appointments are from 11am-5pm, Monday, September 26 – Friday, December 2. Meet your librarian at the Reference Desk, 2nd floor, Doe Library.

Post submitted by Lynn Jones, Reference Coordinator


Workshop: Out of the Archives, Into Your Laptop

 
Event date: Friday, February 12, 2016
Event time: 2:00PM – 3:30PM
Event location: Doe 308A
Before you head out to do research in the archives this semester, please join us for a workshop on best practices for gathering and digitizing research materials. This workshop will focus on capturing visual and manuscript materials, but will be useful for any researcher collecting research materials from archives. Topics covered will include smart capture workflows, preserving and moving metadata, copyright, and platforms for managing and organizing your research data.

Presenters:

  • Mary Elings, Head of Digital Collections, Bancroft Library
  • Lynn Cunningham, Principal Digital Curator, Art History Visual Resources Center
  • Jason Hosford, Senior Digital Curator, Art History Visual Resources Center
  • Jamie Wittenberg, Research Data Management Service Design Analyst, Research IT
  • Camille Villa, Digital Humanities Assistant, Research IT

Report on the research needs of historians

Ithaka S+R (part of ITHAKA, which brings us JSTOR and Portico) has published the first of many studies it plans to conduct on the changing research methods and practices of scholars in various disciplines. Supporting the Changing Research Practices of Historians examines the needs of historians and provides suggestions for how research support providers (including libraries) can better serve them.

From their web site:
“Our interviews of faculty and graduate students reveal history as a field in transition. It is characterized by a vast expansion of new sources, widely adopted research practices and communication mechanisms shaped by new technologies, and a small but growing subset of scholars utilizing new methodologies to ask questions or share findings in fresh, unique ways.

Research support providers such as libraries, archives, humanities centers, scholarly societies, and publishers – not to mention academic departments that are often at the front line of educating the next generation of scholars – need to innovate in support of these changes. This report provides context and a set of recommendations that we hope will help.

Download Report