Once again, UC Libraries are collaborating on a UC-wide Love Data Week series of talks, presentations, and workshops Feb. 14-18, 2022. With over 30 presentations and workshops, there’s plenty to choose from, with topics such as:
- How to write effective data management plans
- Text analysis with Python
- How and where to share your research data
- Geospatial analysis with R and with Jupyter Notebooks
- Data ethics & justice
- Cleaning and coding data for qualitative analysis
- Software management for researchers
- An introduction to databases for newspapers and social science data
- 3-D data, visualization, and mapping
All members of the UC community are invited to attend these events to gain hands-on experience, learn about resources, and engage in discussions about data needs throughout the research process. To register for workshops during this week and see what other sessions will be offered UC-wide, visit the UC Love Data Week 2022 website.
The Library Data Services Program is offering a series of workshops on working with qualitative and textual data. Each workshop is designed to help novice learners get started with cleaning, organizing, analyzing, and presenting qualitative or textual data. Sessions include cleaning and coding qualitative data in MaxQDA and the open-source Taguette program, organizing and writing up research projects in Scrivener, and archiving qualitative data once a project has been completed. Each workshop is designed to act as a starting point for learning concepts and will familiarize attendees with additional resources for getting help.
Wednesday, January 26th from 10:00 – 11:00 AM
Tuesday, February 15th: 10:00 AM – 12:00 PM
Monday, March 14th: 1:00 – 3:00 PM
Monday, April 18th: 1:00 – 3:00 PM
Even beyond those who believe that librarians sit around and read books all day (which would be delightful but is most definitely not our reality), many are surprised to learn that librarians double as active researchers. This is especially true in settings where librarians are members of the faculty, but even where that isn’t the case, such as at Berkeley, librarians are born investigators and it carries over into wanting to find out about and add to knowledge of our settings.
What does it look like to conduct library research? Glad you asked! In our case, it started with a conversation and an idea. Natalia Estrada (now Berkeley’s Political Science and Public Policy Librarian, then the Social Sciences Collection and Reference Assistant and in library school) and I were talking about how much we admired the work of Kaetrena Davis Kendrick. Kendrick wrote a foundational work in the study of librarian workplace morale, The Low Morale Experience of Academic Librarians: A Phenomenological Study, and it sparked many more studies on this topic. But, where were the studies of library staff experiences? We wanted to find out!
We were lucky to recruit two colleagues who added so much to the team: Bonita Dyess, Circulation/Reserves Supervisor at the Earth Sciences/Map Library, and Celia Emmelhainz, Berkeley’s Anthropology & Qualitative Research Librarian. First we applied for (and eventually got) funding for the research from LAUC (the Librarians Association of the University of California). This meant we could pay for transcribing our interviews, give the participants gift cards, and buy qualitative data analysis software. Then we applied for (and got) approval from the IRB (Institutional Review Board), making sure we were complying with processes for research with human subjects.
Here’s where the “pandemic edition” part comes in. All this planning and applying, starting in November 2019, took time; so, at the point we were actually ready to recruit participants, it was April 2020. We were sheltering in place, and not sure how this all would work (although it was probably better than having to go virtual in mid-stream)! Nevertheless, we hurled out information about and invitations to be part of the study to every list-serv, association, and friendly librarian we could think of, nationwide. We ended up doing 34 interviews with academic library staff from a range of locations and institution types (purposefully excluding the UC system), during a three-week period in May-June 2020. Due to COVID these were all online, either by phone or Google Meet (sort of like Zoom), and we asked a structured list of questions, with room for branching into other topics, or diving deeply. Celia trained a wonderful student to transcribe the interviews, and once we had those transcripts and stripped identifying information from them, we were off– coding away (using MAXQDA software), and drawing themes, quotes, recommendations, and other findings from the surprisingly rich information we’d collected.
Next—we had to start getting the information out into the world! Our eventual goal is to write a paper, or several, for publication. There are a number of library and information science journals out there that we are considering… but that takes time as well, and we wanted to start presenting our findings sooner. So, we did an “initial findings” presentation to the UC Berkeley Library Research Working Group, and then stepped into the big time with acceptance to present a poster at the 2021 Association of College and Research Libraries online conference (our poster got almost 600 views), and with a webinar we did for the Pennsylvania Library Association (both the poster and the webinar slides are available through the UC’s eScholarship portal). All our work to get to this point is hopefully now helping others.
And, a word about connecting with our participants. We were bowled over by their generosity with us and by all they had to say: much that we didn’t expect, and much that they were grateful someone was even asking about. It ended up that we had captured one of the last opportunities to get a snapshot of pre-COVID library staff life; people were still in limbo, and talked about their regular jobs before any lockdowns, for the most part. At that point most expected to be back in their libraries and all to be normal by the end of the summer 2020. We know now that that didn’t happen, and we know that library re-openings and staff roles in them have been challenging and sometimes contentious; we wish we’d known to ask for permission to re-interview our participants—even if only to check in with them. But how could we have known? We wonder how they are.
So now, we have papers to write, and thinking to do about how to take our questions into new avenues of research—because it’s a never-ending, and completely exciting process, and, we suspect, will be very different (easier? or not?) in the post-COVID landscape. Do you have ideas for us? We’d love to hear them! Or want to hear more about our morale study? Please get in touch with us at email@example.com!
Since our Love Data Week invitation post last year, the COVID pandemic has created a new world— and amazing new opportunities and challenges related to data. Just a peek at data.berkeley.edu (the portal for Berkeley’s Computing, Data Science, and Society Division) shows that data-related research during this past pandemic year, even with its intense and difficult challenges, has revealed new insights. Check out “Pandemic provides real-time experiment for diagnosing, treating misinformation, disinformation”.*
So, it’s fitting that Love Data Week 2021 at Berkeley, hosted by the UC Berkeley Library in partnership with Berkeley’s Research IT department, is focused on the kinds of issues we are confronted with in a wholly-online research environment. Join us on Tuesday for a session on ethical considerations in data, most definitely a concern with many of Berkeley’s researchers looking at issues related to COVID; on Wednesday for a talk on cybersecurity (aimed at graduate researchers but all are welcome); on Thursday for another security-related workshop, “Getting Started with LastPass & Veracrypt”; and on Friday for an introduction to Savio, Berkeley’s high performance computing cluster. Please click on this link for information on these, and registration links!
Questions? E-mail LDW 2021 at firstname.lastname@example.org . And, if we’ve whetted your appetite for data and more data, take a look at the University of California-wide Love Data Week offerings. If you’ve ever wondered what an API is, or want a quick intro to SQL, or even just want to know what the acronyms stand for, there are these sessions and more!
* The same page makes it clear that data is for everyone; check out “I Am a Data Scientist”, about a student who came to Berkeley as an English major and discovered how data can “shed light on larger-scale questions”, and “Translating Numbers Into Words: The Art of Writing About Data Science”, featuring three Berkeleyites who are getting the word out about data.
Are you unsure about how you can use or reuse other people’s data in your teaching or research, and what the terms and conditions are? Do you want to share your data with other researchers or license it for reuse but are wondering how and if that’s allowed? Do you have questions about university or granting agency data ownership and sharing policies, rights, and obligations? We will provide clear guidance on all of these questions and more in this interactive webinar on the ins-and-outs of data sharing and publishing.
- Explore venues and platforms for sharing and publishing data
- Unpack the terms of contracts and licenses affecting data reuse, sharing, and publishing
- Help you understand how copyright does (and does not) affect what you can do with the data you create or wish to use from other people
- Consider how to license your data for maximum downstream impact and reuse
- Demystify data ownership and publishing rights and obligations under university and grant policies
Intended audiences include faculty, grad students, post-docs, instructors, and academic support staff, but anyone interested is welcome to attend.
Although we don’t always think of it that way, one federal government program that affects each of us in the United State is the decennial census. And among the challenges of many kinds that a pandemic has brought us, its effects on gathering good quality census data is high on the list.
Earlier this year, the Library hosted a well-attended (physical) exhibit related to the census, Power and the People: The US Census and Who Counts (which can still be experienced online). Related to the exhibit, we were on board with our plan to host a panel of campus experts on the contested race and ethnicity questions in the census, and how they’ve shifted over time…. Until March 17, when the Bay Area went into a shelter-in-place order and the program had to be postponed. But last month, thanks to a persistent team, generous panelists, and the wonders of Zoom, we were thrilled to able to present the panel at last, online!
The program, titled Checking the Boxes: Race(ism), Latinx and the Census, featured three UC Berkeley experts on racial and ethnic categorizations in the census. Cristina Mora (Associate Professor of Sociology and Chicano/Latino studies), Tina Sacks (Assistant Professor, School of Social Welfare), and Victoria Robinson (Lecturer and American Cultures Program Director, Department of Ethnic Studies) were joined by our moderator, librarian Jesse Silva, for presentations and a lively Q&A.
Professor Mora started the program off with the information that “ethnic and race categories are political constructs… They are not set-in-stone scientific markers of identity or genetic composition.” She noted that since the census counts are directly related to funding, communities have a vested interest in getting accurate and complete counts, but this can be very difficult for groups and areas that are designated Hard to Count. Professor Sacks continued by emphasizing the ways in which census-driven funding allocations can affect people in poverty and those in social safety net programs. She also noted the intersections shown by census data between race and place, such as areas with a substantial number of incarcerated people. Finally Professor Robinson added background and context by discussing the site racebox.org, which shows the history of the race questions on the census from 1790 onwards, and which illuminates the changes in the cultural and social conceptions of what race is and how it can be measured.
The program concluded with an animated question and answer period, which included Professor Mora’s elaborating on the differences between racial and ethnic categories, Professor Sacks (who has actually been a census enumerator) discussing the challenges of counting the homeless population, and Professor Robinson revisiting the question of incarceration and the Attica problem: “[Incarcerated people’s] residence is considered to be a prison. That’s not their home, and the relationship then to the power…in the communities that they [aren’t from], that’s the Attica problem.”
Of course, this summary doesn’t do justice to the range and depth of the issues discussed. If you missed this program, or would like to see it again, check it out on the UC Berkeley Library’s YouTube channel!
The California Digital Library (CDL) recently partnered with Dryad to provide enhanced data publishing and curation support for researchers. Dryad is a free service that enables researchers to archive and make publicly available their research data for the long term. Dryad replaces Dash, which was the data repository previously available to the university.
Datasets published in Dryad receive a Digital Object Identifier (DOI) and a citation, both of which provide the data a persistent location, identification, and makes the data citable in future use. Additionally, Dryad fulfills many of the data sharing requirements stipulated by funders and publishers, many of whom may require that data be made freely and openly available at the end of a project or upon publication.
Publishing data to Dryad is relatively quick and easy. As a UC Berkeley researcher, begin the upload process by signing in to Dryad using your ORCID ID. The data is then reviewed by a curator, meaning the data is reviewed and enriched to be Findable, Accessible, Interoperable, and Reusable or FAIR. By making your data FAIR, others in your area of expertise will be able to locate, understand, and potentially reuse the data you generated. Data that is made easily findable and publicly available contributes to raising the quality of scholarly output by making the process of data production transparent. Funders require data publishing to better leverage research dollars and publishers require data publishing to enhance the quality of scholarly literature.
Please visit datadryad.org to explore published datasets. If you have any questions about preparing your data for publication or using Dryad, please contact email@example.com.
This nationwide campaign is designed to raise awareness about data management, security, sharing, and preservation. Students, researchers, librarians and data specialists are invited to attend these events to gain hands on experience, learn about resources, and engage in discussion around data needs throughout the research process.
To register for these events and find out more, please visit: https://guides.lib.berkeley.edu/ldw2019
MONDAY, FEBRUARY 11
Intro to Savio workshop
3:30-5:00 pm, Dwinelle 117 (Academic Innovation Studio)
Berkeley Research Computing is offering an introductory training session on using Savio, the campus Linux high-performance computing cluster. We’ll give an overview of how the cluster is set up, different ways you can get access to the cluster, logging in, transferring files, accessing software, and submitting and monitoring jobs. New, prospective, and current users are invited.
TUESDAY, FEBRUARY 12
Code Ocean lunch & learn
12:00-1:00 pm, Doe Library, Room 190 (BIDS)
Join us for a demonstration and Q&A session on the Code Ocean platform! Code Ocean is a cloud-based computational reproducibility platform that provides researchers and developers an easy way to share, discover, and run code published in academic journals and conferences.
TUESDAY, FEBRUARY 12
Preparing your data and code for reproducible publication
2:00-4:00 pm, Doe Library, Room 190 (BIDS)
This is a step-by-step, practical workshop to prepare your research code and data for computationally reproducible publication. The workshop starts with some brief introductory information about computational reproducibility, but the bulk of the workshop is guided work with code and data. We cover the basic best practices for publishing code and data.
WEDNESDAY, FEBRUARY 13
Shaping Clouds: Scaling Infrastructure for Research and Instruction at Berkeley
1:00-2:00 pm, Doe Library, Room 190 (BIDS)
There are many great resources for research and instruction across campus, but it can be difficult to determine what is available and where to find it. Join us for a showcase and community discussion about two cutting-edge cloud platforms, Analytic Environments on Demand (AEoD) and JupyterHub, and how best to provide a holistic ecosystem of these and other tools.
THURSDAY, FEBRUARY 14
Data Security: I just called to say I love you
1:00-2:00 pm, Dwinelle 117 (Academic Innovation Studio)
Learn what love the Information Security & Policy office shows campus and why a day without ISP would break the University’s heart. We will also talk about simple ways you can protect your identity and show your data love.
The UC Berkeley Library is hosting the 2018 Library Carpentry Sprint on May 10th and 11th. This sprint it a part of the larger 2018 Mozilla Global Sprint, and will take place in the Berkeley Institute for Data Science (BIDS), 190 Doe Library from 2-5pm on Thursday, May 10th and from 1-5pm on Friday, May 11th. All are welcome and no experience with Library Carpentry or participating in a sprint is required. Come help us update the existing Library Carpentry curriculum or just come to see what Library Carpentry is all about. If you wish to sign up in advance, simply add you name to the Library Carpentry sprint etherpad under the UC Berkeley section. More information about Library Carpentry can be found here.
Library Carpentry Sprint is an international campaign that is a part of the larger Mozilla Global Sprint 2018. The goal of this Library Carpentry sprint is to improve/extend Library Carpentry lessons. Participants can contribute code or content, proofread writing, help with visual design and graphic art, do QA (quality assurance) on prototype tools, or advise or comment on project ideas or plans. All skill levels are welcome!
You can drop by anytime on May 10th from 2-5pm or May 11th from 1-5pm
Berkeley Institute for Data Science (BIDS), 190 Doe Memorial Library
Contact Scott Peterson, firstname.lastname@example.org
— Yasmina Anwar (@yasmina_anwar) February 13, 2018
Last week, the University Library, the Berkeley Institute for Data Science (BIDS), the Research Data Management program were delighted to host Love Data Week (LDW) 2018 at UC Berkeley. Love Data Week is a nationwide campaign designed to raise awareness about data visualization, management, sharing, and preservation. The theme of this year’s campaign was data stories to discuss how data is being used in meaningful ways to shape the world around us.
At UC Berkeley, we hosted a series of events designed to help researchers, data specialists, and librarians to better address and plan for research data needs. The events covered issues related to collecting, managing, publishing, and visualizing data. The audiences gained hands-on experience with using APIs, learned about resources that the campus provides for managing and publishing research data, and engaged in discussions around researchers’ data needs at different stages of their research process.
Participants from many campus groups (e.g., LBNL, CSS-IT) were eager to continue the stimulating conversation around data management. Check out the full program and information about the presented topics.
Photographs by Yasmin AlNoamany for the University Library and BIDS.
LDW at UC Berkeley was kicked off by a walkthrough and demos about Scopus APIs (Application Programming Interface), was led by Eric Livingston of the publishing company, Elsevier. Elsevier provides a set of APIs that allow users to access the content of journals and books published by Elsevier.
In the first part of the session, Eric provided a quick introduction to APIs and an overview about Elsevier APIs. He illustrated the purposes of different APIs that Elsevier provides such as DirectScience APIs, SciVal API, Engineering Village API, Embase APIs, and Scopus APIs. As mentioned by Eric, anyone can get free access to Elsevier APIs, and the content published by Elsevier under Open Access licenses is fully available. Eric explained that Scopus APIs allow users to access curated abstracts and citation data from all scholarly journals indexed by Scopus, Elsevier’s abstract and citation database. He detailed multiple popular Scopus APIs such as Search API, Abstract Retrieval API, Citation Count API, Citation Overview API, and Serial Title API. Eric also overviewed the amount of data that Scopus database holds.
In the second half of the workshop, Eric explained how Scopus APIs work, how to get a key to Scopus APIs, and showed different authentication methods. He walked the group through live queries, showed them how to extract data from API and how to debug queries using the advanced search. He talked about the limitations of the APIs and provided tips and tricks for working with Scopus APIs.
Eric left the attendances with actionable and workable code and scripts to pull and retrieve data from Scopus APIs.
On the second day, we hosted a Data Stories and Visualization Panel, featuring Claudia von Vacano (D-Lab), Garret S. Christensen (BIDS and BITSS), Orianna DeMasi (Computer Science and BIDS), and Rita Lucarelli (Department of Near Eastern Studies). The talks and discussions centered upon how data is being used in creative and compelling ways to tell stories, in addition to rewards and challenges of supporting groundbreaking research when the underlying research data is restricted.
Claudia von Vacano, the Director of D-Lab, discussed the Online Hate Index (OHI), a joint initiative of the Anti-Defamation League’s (ADL) Center for Technology and Society that uses crowd-sourcing and machine learning to develop scalable detection of the growing amount of hate speech within social media. In its recently-completed initial phase, the project focused on training a model based on an unbiased dataset collected from Reddit. Claudia explained the process, from identifying the problem, defining hate speech, and establishing rules for human coding, through building, training, and deploying the machine learning model. Going forward, the project team plans to improve the accuracy of the model and extend it to include other social media platforms.
Next, Garret S. Christensen, BIDS and BITSS fellow, talked about his experience with research data. He started by providing a background about his research, then discussed the challenges he faced in collecting his research data. The main research questions that Garret investigated are: How are people responding to military deaths? Do large numbers of, or high-profile, deaths affect people’s decision to enlist in the military?
Garret discussed the challenges of obtaining and working with the Department of Defense data obtained through a Freedom of Information Act request for the purpose of researching war deaths and military recruitment. Despite all the challenges that Garret faced and the time he spent on getting the data, he succeeded in putting the data together into a public repository. Now the information on deaths in the US Military from January 1, 1990 to November 11, 2010 that was obtained through Freedom of Information Act request is available on dataverse. At the end, Garret showed that how deaths and recruits have a negative relationship.
Orianna DeMasi, a graduate student of Computer Science and BIDS Fellow, shared her story of working with human subjects data. The focus of Orianna’s research is on building tools to improve mental healthcare. Orianna framed her story about collecting and working with human subject data as a fairy tale story. She indicated that working with human data makes security and privacy essential. She has learned that it’s easy to get blocked “waiting for data” rather than advancing the project in parallel to collecting or accessing data. At the end, Orianna advised the attendees that “we need to keep our eyes on the big problems and data is only the start.”
Rita Lucarelli, Department of Near Eastern Studies discussed the Book of the Dead in 3D project, which shows how photogrammetry can help visualization and study of different sets of data within their own physical context. According to Rita, the “Book of the Dead in 3D” project aims in particular to create a database of “annotated” models of the ancient Egyptian coffins of the Hearst Museum, which is radically changing the scholarly approach and study of these inscribed objects, at the same time posing a challenge in relation to data sharing and the publication of the artifacts. Rita indicated that metadata is growing and digital data and digitization are challenging.
It was fascinating to hear about Egyptology and how to visualize 3D ancient objects!
We closed out LDW 2018 at UC Berkeley with a session about Research Data Management Planning and Publishing. In the session, Daniella Lowenberg (University of California Curation Center) started by discussing the reasons to manage, publish, and share research data on both practical and theoretical levels.
Daniella shared practical tips about why, where, and how to manage research data and prepare it for publishing. She discussed relevant data repositories that UC Berkeley and other entities offer. Daniela also illustrated how to make data reusable, and highlighted the importance of citing research data and how this maximizes the benefit of research.
At the end, Daniella presented a live demo on using Dash for publishing research data and encouraged UC Berkeley workshop participants to contact her with any question about data publishing. In a lively debate, researchers shared their experiences with Daniella about working with managing research data and highlighted what has worked and what has proved difficult.
We have received overwhelmingly positive feedback from the attendees. Attendees also expressed their interest in having similar workshops to understand the broader perspectives and skills needed to help researchers manage their data.
I would like to thank BIDS and the University Library for sponsoring the events.