What a semester! What’s up next?

Photo by Karen Lau on Unsplash

Is it just us, or was fall semester a whirlwind? The Office of Scholarly Communication Services was steeped in a steady flurry of activity, and suddenly it’s December! We wanted to take a moment to highlight what we’ve been up to since August, and give you a preview of what’s ahead for spring.

We did the math on our affordable course content pilot program, which ran for academic year 2017-2018 and Fall 2018. This pilot supported just over 40 courses and 2400 students, and is estimated to have yielded approximately $200,000 in student savings. We’ll be working with campus on next steps for helping students save money. If you have questions about how to make your class more affordable, you can check out our site or e-mail us.

We dug deep into scholarly publishing skills with graduate students and early career researchers during our professional development workshop series. We engaged learners in issues like copyright and their dissertations, moving from dissertation to first book, and managing and maximizing scholarly impact. Publishing often isn’t complete without sharing one’s data, so we helped researchers understand how to navigate research data copyright and licensing issues at #FSCI2018.

We helped instructors and scholars publish open educational resources and digital books with PressbooksEDU on our new open books hub.

On behalf of the UC’s Council of University Librarians, we chaired and hosted the Choosing Pathways to OA working forum. The forum brought together approximately 125 representatives of libraries, consortia, and author communities throughout North America to develop personalized action plans for how we can all transition funds away from subscriptions and toward sustainable open access publishing. We will be reporting on forum outcomes in 2019. In the meantime, one immediate result was the formation of a working group to support scholarly society journal publishers in flipping their journals from closed access to open access. Stay tuned for an announcement in January.

We funded dozens of Open Access publications by UC Berkeley authors through our BRII program

We developed a novel literacies workflow for text data mining researchers. Text mining allows researchers to use automated techniques to glean trends and information from large volumes of unstructured textual sources. Researchers often perceive legal stumbling blocks to conducting this type of research, since some of the content is protected by copyright or other use restrictions. In Fall 2018, we began training the UC Berkeley community on how to navigate these challenges so that they can confidently undertake this important research. We’ll have a lot more to say about our work on this soon!

Next semester, we’re continuing all of these efforts with a variety of scholarly publishing workshops. We invite you to check out: Copyright & Fair Use for Digital Projects, Text Data Mining & Publishing: Legal Literacies, Copyright for Wikipedia Editing, and more.

We would like to thank Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin, for their generous support in helping to make the work of the Office of Scholarly Communication Services possible.

Lastly, we’d like to thank all of you for your engagement and support this semester! Please let us know how else we can serve you. In the meantime, we wish you a Happy New Year!

E-mail: schol-comm@berkeley.edu

Twitter: @UCB_scholcomm

Website: lib.berkeley.edu/scholcomm


HathiTrust Research Center (HTRC) UnCamp Fellowships

HathiTrust Research Center (HTRC) UnCamp Fellowships
The UCB Libraries are delighted to offer a limited number of general fellowships for free admission to the 2018 HTRC UnCamp at UC Berkeley.
These fellowships are open to current UC Berkeley students and staff. All qualified applicants will be accepted in order of application while fellowships are available, though priority will be given to student applicants. Fellowship applications are due by Nov 13.
Apply here:
Note: Those who do not receive fellowship awards will be informed in time to register at the UnCamp Early Bird price.
About the HathiTrust Research Center (HTRC) 2018 UnCamp
Location: University of California Libraries, Berkeley, CA
Dates: January 25-26, 2018
HTRC UnCamp 2018 aims to facilitate the creation of a national community focussed on improving research use of the HathiTrust corpus through computational analysis. The UnCamp will discuss topics relevant to understanding and utilizing the HathiTrust Digital Library corpus within the modern computational research eco-system. This includes discussion of practices and experiences in mass-scale data mining, visualization, and analysis of the HT collection, with the goal of improving the quality of access and use of the collection by means of the HTRC Data Capsule and other affiliated research tools.
Stacy Reardon
Literatures and Digital Humanities Librarian
438 Doe Library | University of California, Berkeley | Berkeley, CA 94720
sreardon@berkeley.edu

Where to Find the Texts for Text Mining

Sketch for Monotype Digital Type Wall
frame1351170437122. Marcin Ignac, CC BY-NC-ND 2.0

Text mining, the process of computationally analyzing large swaths of natural language texts, can illuminate patterns and trends in literature, journalism, and other forms of textual culture that are sometimes discernible only at scale, and it’s an important digital humanities method. If text mining interests you, then finding the right tool — whether you turn to an entry-level system like Voyant or master a programming language like Python — is only a part of the solution. Your analyses are only as strong as the texts you’re working with, after all, and finding authoritative text corpora can sometimes be difficult due to paywalls and licensing restrictions. The good news is the UC Berkeley Libraries offer a range of text corpora for you to analyze, and we can help you get your hands on things we don’t already have access to.

The first step in your exploration should be the library’s Text Mining Guide, which lists text corpora that are either publicly accessible (e.g., the Library of Congress’s Chronicling America newspaper collection) or are available to UCB faculty, students, and staff (e.g., JSTOR Data for Research).  The content of these sources are available in a variety of formats: you may be able to download the texts in bulk, use an API, or make use of a content provider’s in-platform tools. In other cases (e.g., ProQuest Historical Newspapers), the library may be able to arrange access upon request. While the scope of the corpora we have access to is wide, we are particularly strong in newspaper collections, pre-20th century English literature collections, and scholarly texts.

What happens if the library doesn’t have what you need? We regularly facilitate the acquisition of text corpora upon request, and you can always email your subject librarian with specific requests or questions. The library will deal with licensing questions so you don’t have to, and we’ll work with you to figure out the best way to make the texts available for your work, often with the help of our friends in the D-Lab or Research IT . We also offer the Data Acquisition and Access Program to provide special funding for one-time data set purchases, including text corpora.  Your requests and suggestions help the library develop our collection, making text mining easier for the next researcher who comes along.

Important caveats:

  • Unless explicitly stated, our contracts for most Library databases and library resources (e.g., Scopus, Project MUSE) don’t allow for bulk download. Please avoid web scraping licensed library resources on your own: content providers realize what is happening pretty quickly, and they react by shutting down access for our entire campus. Ask your subject librarian  for help instead.
  • Keep in mind that many of the vendors themselves are limited in how, and how much access, they can provide to a particular resource, based on their own contractual agreements. It’s not uncommon for specific contemporary newspapers and journals to be unavailable for analysis at scale, even when library funding for access may be available.

Related resources:

 

Stacy Reardon and Cody Hennesy
Contact us at sreardon [at] berkeley.edu; chennesy [at] berkeley.edu