HathiTrust Research Center (HTRC) UnCamp Fellowships

HathiTrust Research Center (HTRC) UnCamp Fellowships
The UCB Libraries are delighted to offer a limited number of general fellowships for free admission to the 2018 HTRC UnCamp at UC Berkeley.
These fellowships are open to current UC Berkeley students and staff. All qualified applicants will be accepted in order of application while fellowships are available, though priority will be given to student applicants. Fellowship applications are due by Nov 13.
Apply here:
Note: Those who do not receive fellowship awards will be informed in time to register at the UnCamp Early Bird price.
About the HathiTrust Research Center (HTRC) 2018 UnCamp
Location: University of California Libraries, Berkeley, CA
Dates: January 25-26, 2018
HTRC UnCamp 2018 aims to facilitate the creation of a national community focussed on improving research use of the HathiTrust corpus through computational analysis. The UnCamp will discuss topics relevant to understanding and utilizing the HathiTrust Digital Library corpus within the modern computational research eco-system. This includes discussion of practices and experiences in mass-scale data mining, visualization, and analysis of the HT collection, with the goal of improving the quality of access and use of the collection by means of the HTRC Data Capsule and other affiliated research tools.
Stacy Reardon
Literatures and Digital Humanities Librarian
438 Doe Library | University of California, Berkeley | Berkeley, CA 94720
sreardon@berkeley.edu

Event: HathiTrust Research Center Text Analysis Tools workshop

Please join us for a workshop on text analysis with HathiTrust Research Center (HTRC) tools, hosted by the Library and Digital Humanities. This workshop is designed for library and information science professionals, but all are welcome. 
Workshop: HathiTrust Research Center Text Analysis Tools
Date & time: August 15th from 2:30 – 4:30
Location: Berkeley Institute for Data Science (190 Doe Library) UC Berkeley
This workshop will introduce attendees to text analysis, primarily as it is employed in the digital humanities, as well as common methods and tools used in this area of scholarship. The session will provide an overview of the field, with particular attention to the HathiTrust Research Center (HTRC) and its tools and services. The HTRC is developing capacity for researchers to build sub-corpora of text from the HathiTrust Digital Library and perform non-consumptive analysis on them. Workshop attendees will develop skills that will allow them to actively support and partner in text analysis research.
I hope to see you there,


Jamie V. Wittenberg

Research Data Management Service Design Analyst
Research IT  |  Library
University of California, Berkeley

Bancroft to Explore Text Analysis as Aid in Analyzing, Processing, and Providing Access to Text-based Archival Collections

Mary W. Elings, Head of Digital Collections, The Bancroft Library

The Bancroft Library recently began testing a theory discussed at the Radcliffe Workshop on Technology & Archival Processing held at Harvard’s Radcliffe College in early April 2014. The theory suggested that archives can use text analysis tools and topic modelling — a type of statistical model for discovering the abstract “topics” that occur in a collection of documents — to analyze text-based archival collections in order to aid in analyzing, processing and describing collections, as well as improving access.

Helping us to test this theory, the Bancroft welcomed summer intern Janine Heiser from the UC Berkeley School of Information. Over the summer, supported by an ISchool Summer Non-profit Internship Grant, Ms. Heiser worked with digitized analog archival materials to test this theory, answer specific research questions, and define use cases that will help us determine if text analysis and topic modelling are viable technologies to aid us in our archival work. Based on her work over the summer, the Bancroft has recently awarded Ms. Heiser an Archival Technologies Fellowship for 2015 so that she can continue the work she began in the summer and further develop and test her work.

                During her summer internship, Ms. Heiser created a web-based application, called “ArchExtract” that extracts topics and named entities (people, places, subjects, dates, etc.) from a given collection. This application implements and extends various natural language processing software tools such as MALLET and the Stanford Core NLP toolkit. To test and refine this web application, Ms. Heiser used collections with an existing catalog record and/or finding aid, namely the John Muir Correspondence collection, which was digitized in 2009.

                For a given collection, an archivist can compare the topics and named entities that ArchExtract outputs to the topics found in the extant descriptive information, looking at the similarities and differences between the two in order to verify ArchExtract’s accuracy. After evaluating the accuracy, the ArchExtract application can be improved and/or refined.

                Ms. Heiser also worked with collections that either have minimal description or no extant description in order to further explore this theory as we test the tool further. Working with Bancroft archivists, Ms. Heiser will determine if the web application is successful, where it falls short, and what the next steps might be in exploring this and other text analysis tools to aid in processing collections.

                The hope is that automated text analysis will be a way for libraries and archives to use this technology to readily identify the major topics found in a collection, and potentially identify named entities found in the text, and their frequency, thus giving archivists a good understanding of the scope and content of a collection before it is processed. This could help in identifying processing priorities, funding opportunities, and ultimately helping users identify what is found in the collection.

               Ms. Heiser is a second year masters’ student at the UC Berkeley School of Information where she is learning the theory and practice of storing, retrieving and analyzing digital information in a variety of contexts and is currently taking coursework in natural language processing with Marti Hearst. Prior to the ISchool, Ms. Heiser worked at several companies where she helped develop database systems and software for political parties, non-profits organizations, and an online music distributor. In her free time, she likes to go running and hiking around the bay area. Ms. Heiser was also one of our participants in the #HackFSM hackathon! She was awarded an ISchool Summer Non-profit Internship Grant to support her work at Bancroft this summer and has been awarded an Archival Technologies Fellowship at Bancroft for 2015.