Building Legal Literacies for Text Data Mining: Call for Participants

LLTDMJoin the Building Legal Literacies for Text Data Mining (Building LLTDM) Institute June 23-26, 2020 on the UC Berkeley campus to learn how to confidently navigate United States law, policy, ethics, and risk within digital humanities text data mining projects — so that participants can more easily engage in this type of research and contribute to the advancement of knowledge.

The program will consist of how law and policy matters pertain to text data mining research, such as copyright, privacy, and ethics. It will also help participants integrate workflows for these law and policy issues into their text data mining research and professional support, practice sharing these new tools through authentic consultation exercises, and develop communities of practice to promote cross-institutional outreach about the digital humanities text data mining legal landscape.

The Institute supports 32 participants based in the United States — 16 digital humanities researchers and 16 digital humanities professionals. Digital humanities professionals are people like librarians, consultants, and other institutional staff who conduct digital humanities text data mining or aid researchers in their text data mining research. Participation from pairs of participants is encouraged (e.g. one digital humanities researcher and one professional affiliated with that same institution, organization, or digital humanities project). The Institute will be taught by a combination of experienced legal scholars, digital humanities professionals, librarians, faculty, and researchers — all of whom are immersed in the Institute’s subject literacies and workflows.

To apply, email to contact-building-lltdm@googlegroups.com a current CV and a 2 page letter of interest addressing your experience with or interest in the intersection of text data mining in digital humanities research and the law as well as your goals for how to apply the knowledge taken from the program. Applications are due December 20, 2019 by 5 p.m. PST. Selection notifications will go out in February 2020.

Visit the Building LLTDM website for more information.


Workshop: The Long Haul: Best Practices for Making Your Digital Project Last

Digital Publishing Workshop Series

You’ve invested a lot of work in creating a digital project, but how do you ensure it has staying power? We’ll look at choices you can make at the beginning of project development to influence sustainability, best practices for documentation and asset management, and how to sunset your project in a way that ensures long-term access for future researchers. Register at bit.ly/dp-berk

Upcoming Workshops in this Series 2019-2020:

  • Check back in Spring!

Please see bit.ly/dp-berk for details.


Publish your scholarship like a pro!

Woman wearing gold watch, sitting at table, typing on a Microsoft Surface notebook
Photograph by Women of Color in Tech, CC-BY 2.0.

We’re more than a month into the fall semester, and if you’re a graduate student or postdoc you’ve probably been thinking about some of the milestones on your horizon, from filing your thesis or dissertation to pitching your first book project or looking for a job.

While we can’t write your dissertation or submit your job application for you, the Library can help in other ways! We are collaborating with GradPro to offer a series of professional development workshops for grad students, postdocs, and other early career scholars to guide you through important decisions and tasks in the research and publishing process, from preparing your dissertation to building a global audience for your work.

  • October 22: Copyright and Your Dissertation
  • October 23: From Dissertation to Book: Navigating the Publication Process
  • October 25: Managing and Maximizing Your Scholarly Impact

These sessions are focused on helping early career researchers develop real-world scholarly publishing skills and apply this expertise to a more open, networked, and interdisciplinary publishing environment.

These workshops are also taking place during Open Access Week 2019, an annual global effort to bring attention to Open Access around the world and highlight how the free, immediate, online availability of scholarship can remove barriers to information, support emerging scholarship, and foster the spread of knowledge and innovation.

Below is the list of next week’s workshop offerings. Join us for one workshop or all three! Each session will take place at the Graduate Professional Development Center, 309 Sproul Hall. Please RSVP at the links below.

Light refreshments will be served at all workshops.

If you have any questions about these workshops, please get in touch with schol-comm@berkeley.edu. And if you can’t make it to a workshop but still need help with your publishing, we are always here for you!

 

Copyright and Your Dissertation

Workshop | October 22 | 1-2:30 p.m. | 309 Sproul Hall

This workshop will provide you with a practical workflow for navigating copyright questions and legal considerations for your dissertation or thesis. Whether you’re just starting to write or you’re getting ready to file, you can use this workflow to figure out what you can use, what rights you have, and what it means to share your dissertation online.

RSVP (Copyright)

 

From Dissertation to Book: Navigating the Publication Process

Panel Discussion | October 23 | 3-4:30 p.m. | 309 Sproul Hall

Hear from a panel of experts – an acquisitions editor, a first-time book author, and an author rights expert – about the process of turning your dissertation into a book. You’ll come away from this panel discussion with practical advice about revising your dissertation, writing a book proposal, approaching editors, signing your first contract, and navigating the peer review and publication process.

RSVP (Book)

 

Managing and Maximizing Your Scholarly Impact

Workshop | October 25 | 1-2:30 p.m. | 309 Sproul Hall

This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.

RSVP (Impact)


Workshop: Publish Digital Books & Open Educational Resources with Pressbooks

Digital Publishing Workshop Series

If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, MOBI, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way! Register at bit.ly/dp-berk

Upcoming Workshops in this Series 2019-2020:

  • The Long Haul: Best Practices for Making Your Digital Project Last

Please see bit.ly/dp-berk for details.


New Resource for Digital Scholarship: Gale Digital Scholar Lab

Interested in computational text analysis, but don’t have coding experience? Or perhaps you’ve already written your own Python scripts, but you’re on the lookout for sources to build your text corpus. The Gale Digital Scholar Lab, new to the Library, offers solutions for digital humanities and digital scholarship researchers regardless of your level of technical expertise.

Create Visualizations and Run Computational Analyses in Your Web Browser
The Gale Digital Scholar Lab offers six analysis tools through which you can analyze Gale materials with just a few clicks:

  • “Clustering” analyzes similar words across documents.
  • “Named entity recognition” extracts proper and common nouns and groups them by types such as people, organizations, or dates.
  • “Ngram” looks at the frequency of various terms or phrases.
  • “Parts of speech tagger” considers how authors’ use of speech varies over time.
  • “Sentiment analysis” tallies the positive or negative words in each document to produce a sentiment value.
  • “Topic modeling” collects terms that frequently co-occur across a group of documents.

Gale Digital Scholar Lab

 

Download Plain-Text Files to Run Your Own Analyses
You can download up to 1000 documents at a time as plain-text files for your personal use. You can run your own analyses on this data and combine it with other text sources to build custom text corpora.

What Content Is Available?
The Gale Digital Scholar Lab includes 160 million pages of Gale Primary Sources content from the following primary source digital archives:

17th and 18th Century Burney Collection
American Civil Liberties Union Papers, 1912-1990
American Fiction
Archives Unbound
Archives of Sexuality & Gender
British Library Newspapers
The Economist Historical Archive
Eighteenth Century Collections Online
Indigenous Peoples: North America
The Making of Modern Law: Foreign Primary Sources
The Making of Modern Law: Foreign, Comparative, and International Law, 1600-1926
The Making of Modern Law: Legal Treatises, 1800-1926
The Making of Modern Law: Primary Sources
The Making of Modern Law: Trials, 1600-1926
The Making of the Modern World
Nineteenth Century Collections Online
Nineteenth Century U.S. Newspapers
Sabin Americana, 1500-1926
The Sunday Times Digital Archive
The Times Digital Archive
The Times Literary Supplement Historical Archive
U.S. Declassified Documents Online

Additional Features

  • View scans of original documents side-by-side with OCR plain text
  • Work iteratively with your content set to refine your results
  • Easily clean your data right in the Gale Digital Scholar Lab interface and create custom text-cleaning templates
  • Work with materials and tools in other languages

How to Get Started

  • Visit the Gale Digital Scholar Lab
  • Log in with your Google or Microsoft OneDrive credentials (a personal account is needed so you can create and save personalized datasets)
  • Create your dataset by searching through the materials in the Lab.
  • Run analyses on your dataset right in the web browser and get immediate results, or download your dataset to your computer to run your own scripts.

Workshop: Copyright and Fair Use for Digital Projects

Digital Publishing Workshop Series

This training will help you navigate the copyright, fair use, and usage rights of including third-party content in your digital project. Whether you seek to embed video from other sources for analysis, post material you scanned from a visit to the archives, add images, upload documents, or more, understanding the basics of copyright and discovering a workflow for answering copyright-related digital scholarship questions will make you more confident in your publication. We will also provide an overview of your intellectual property rights as a creator and ways to license your own work. Register at bit.ly/dp-berk

Upcoming Workshops in this Series 2019-2020:

  • Publish Digital Books & Open Educational Resources with Pressbooks
  • The Long Haul: Best Practices for Making Your Digital Project Last

Please see bit.ly/dp-berk for details.


Universitas Linguarum

The Languages of Berkeley: An Online Exhibition

The Languages of Berkeley

Linguarum enim inscitia disciplinas universas aut exstinxit, aut depravavit…

For ignorance of languages either marred or abolished the world of learning….

—Erasmus, 1529, De pueris statim ac liberaliter instituendis. Opera I, 377

Berkeley’s celebration of languages in the Library could not come at a better moment. We are living in a time when many Americans are smugly self-satisfied about speaking English Only, when our government has waged an ugly war against immigrants, when linguistic and cultural otherness is too often construed as a threat, and when the world of learning is narrowing to a point where it may again be falling on unfortunate times.

The national trends are clear. A recent report from the Modern Language Association shows that 651 foreign language programs in American colleges and universities were lost between 2013 and 2016. And these are not all “less commonly taught” languages: according to the MLA report, during the 2013-16 period, net losses included 129 French programs, 118 Spanish programs, 86 German programs, and 56 Italian programs. Since 2009, overall foreign language enrollments have declined by 15.3 percent nationally. A recent Pew Research Center study showed that only 20% of American K-12 students study a foreign language (as compared to 92% in Europe).

Berkeley is not immune to decreases in language enrollments, but our programs remain unusually strong and have been staunchly supported by the Berkeley administration. In any given year, between 50 and 60 languages are taught on campus, and this remarkable breadth reflects the diversity of the State of California and the backgrounds and research interests of our students and faculty. California leads the nation in linguistic diversity: 42% of Californians speak a language other than English in their homes (as of 2016), and California has more than a hundred indigenous languages. Not surprisingly, this year’s incoming students speak more than 20 languages.

Globalization is ostensibly a strong impetus for language study — and it is in most parts of the world, where knowledge of English and other major languages is viewed as a fundamental necessity for participation in the global economy. However, in the U.S., it seems that globalization has had the opposite effect, leading many Americans to adopt a complacent attitude: why study other languages when so much of the world revolves around English?

Berkeley resists such complacency. We recognize that knowing other languages opens up fresh perspectives on the world, on our relationships with others, on our own language and culture, on the various disciplines we study, and on the problems we strive to solve. Indeed, so many of the challenges we face today are global in nature and can only be approached through the multiplicity of perspectives that come with international cooperation and collaboration. While English may allow for broad sharing of information, the reality is that we will never fully understand the nuances of other peoples’ perspectives if we don’t speak their language. Furthermore, because language, thought, and identity are so intimately intertwined, acquiring languages other than our mother tongue enriches our very being, allowing us to take on new identities, adopt new attitudes and beliefs, develop greater cognitive flexibility, and understand ourselves and our culture in a new light. Seeing the world through the lens of another language and culture also fosters empathy, which is essential to counter increasingly pervasive waves of ethno-nationalism.

Our university library reflects this awareness that languages nourish our imagination, enhance our creativity, and broaden and deepen our understanding of worlds past and present. More than half of the 13 million volumes in UC Berkeley’s collection are in languages other than English. Remembering that the word university derives from the Latin universitas, signifying both universality and community, let us celebrate together the rich diversity of the Library’s holdings and of languages on the Berkeley campus.

Rick Kern,
Professor, Department of French
Director, Berkeley Language Center

The Languages of Berkeley [fan]
previous | about | next

The Languages of Berkeley is a dynamic online sequential exhibition celebrating the diversity of languages that have advanced research, teaching and learning at the University of California, Berkeley. It is made possible with support from the UC Berkeley Library and is co-sponsored by the Berkeley Language Center (BLC).

Follow The Languages of Berkeley!
Subscribe by email
Contact/Feedback
www.ucblib.link/languages

What’s your favorite language?


Team Awarded Grant to Help Digital Humanities Scholars Navigate Legal Issues of Text Data Mining

We are thrilled to share that the National Endowment for the Humanities (NEH) has awarded a $165,000 grant to a UC Berkeley-led team of legal experts, librarians, and scholars who will help humanities researchers and staff navigate complex legal questions in cutting-edge digital research.

What is this grant all about?

If you were to crack open some popular English-language novels written in the 1850’s–say, ones from Brontë, Hawthorne, Dickens, and Melville–you would find they describe men and women in very different terms. While a male character might be said to “get” something, a female character is more likely to have “felt” it. Whereas the word “mind” might be used when describing a man, the word “heart” is more likely to be used about a woman. Yet, as the 19th Century became the 20th, these descriptive differences between genders actually diminish. How do we know all this? We confess we have not actually read every novel ever written between the 19th and 21st Centuries (though we’d love to envision a world in which we could). Instead, we can make this assertion because researchers (including David Bamman, of UC Berkeley’s School of Information) used automated techniques to extract information from the novels, and analyzed these word usage trends at scale. They crafted algorithms to turn the language of those novels into data about the novels.

In fields of inquiry like the digital humanities, the application of such automated techniques and methods for identifying, extracting, and analyzing patterns, trends, and relationships across large volumes of unstructured or thinly-structured digital content is called “text data mining.” (You may also see it referred to as “text and data mining” or “computational text analysis”). Text data mining provides humanists and social scientists with invaluable frameworks for sifting, organizing, and analyzing vast amounts of material. For instance, these methods make it possible to:

The Problem

Until now, humanities researchers conducting text data mining have had to navigate a thicket of legal issues without much guidance or assistance. For instance, imagine the researchers needed to scrape content about Egyptian artifacts from online sites or databases, or download videos about Egyptian tomb excavations, in order to conduct their automated analysis. And then imagine the researchers also want to share these content-rich data sets with others to encourage research reproducibility or enable other researchers to query the data sets with new questions. This kind of work can raise issues of copyright, contract, and privacy law, not to mention ethics if there are issues of, say, indigenous knowledge or cultural heritage materials plausibly at risk. Indeed, in a recent study of humanities scholars’ text analysis needs, participants noted that access to and use of copyright-protected texts was a “frequent obstacle” in their ability to select appropriate texts for text data mining. 

Potential legal hurdles do not just deter text data mining research; they also bias it toward particular topics and sources of data. In response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid decision-making about rights-protected data. They use texts that have entered into the public domain or use materials that have been flexibly licensed through initiatives such as Creative Commons or Open Data Commons. When researchers limit their research to such sources, it is inevitably skewed, leaving important questions unanswered, and rendering resulting findings less broadly applicable. A growing body of research also demonstrates how race, gender, and other biases found in openly available texts have contributed to and exacerbated bias in developing artificial intelligence tools. 

The Solution

The good news is that the NEH has agreed to support an Institute for Advanced Topics in the Digital Humanities to help key stakeholders to learn to better navigate legal issues in text data mining. Thanks to the NEH’s $165,000 grant, Rachael Samberg of UC Berkeley Library’s Office of Scholarly Communication Services will be leading a national team (identified below) from more than a dozen institutions and organizations to teach humanities researchers, librarians, and research staff how to confidently navigate the major legal issues that arise in text data mining research. 

Our institute is aptly called Building Legal Literacies for Text Data Mining (Building LLTDM), and will run from June 23-26, 2020 in Berkeley, California. Institute instructors are legal experts, humanities scholars, and librarians immersed in text data mining research services, who will co-lead experiential meeting sessions empowering participants to put the curriculum’s concepts into action.

In October, we will issue a call for participants, who will receive stipends to support their attendance. We will also be publishing all of our training materials in an openly-available online book for researchers and librarians around the globe to help build academic communities that extend these skills.

Building LLTDM team member Matthew Sag, a law professor at Loyola University Chicago School of Law and leading expert on copyright issues in the digital humanities, said he is “excited to have the chance to help the next generation of text data mining researchers open up new horizons in knowledge discovery. We have learned so much in the past ten years working on HathiTrust [a text-minable digital library] and related issues. I’m looking forward to sharing that knowledge and learning from others in the text data mining community.” 

Team member Brandon Butler, a copyright lawyer and library policy expert at the University of Virginia, said, “In my experience there’s a lot of interest in these research methods among graduate students and early-career scholars, a population that may not feel empowered to engage in “risky” research. I’ve also seen that digital humanities practitioners have a strong commitment to equity, and they are working to build technical literacies outside the walls of elite institutions. Building legal literacies helps ease the burden of uncertainty and smooth the way toward wider, more equitable engagement with these research methods.”

Kyle K. Courtney of Harvard University serves as Copyright Advisor at Harvard Library’s Office for Scholarly Communication, and is also a Building LLTDM team member. Courtney added, “We are seeing more and more questions from scholars of all disciplines around these text data mining issues. The wealth of full-text online materials and new research tools provide scholars the opportunity to analyze large sets of data, but they also bring new challenges having to do with the use and sharing not only of the data but also of the technological tools researchers develop to study them. I am excited to join the Building LLTDM team and help clarify these issues and empower humanities scholars and librarians working in this field.”

Megan Senseney, Head of the Office of Digital Innovation and Stewardship at the University of Arizona Libraries reflected on the opportunities for ongoing library engagement that extends beyond the initial institute. Senseney said that, “Establishing a shared understanding of the legal landscape for TDM is vital to supporting research in the digital humanities and developing a new suite of library services in digital scholarship. I’m honored to work and learn alongside a team of legal experts, librarians, and researchers to create this institute, and I look forward to integrating these materials into instruction and outreach initiatives at our respective universities.”

Next Steps

The Building LLTDM team is excited to begin supporting humanities researchers, staff, and librarians en route to important knowledge creation. Stay tuned if you are interested in participating in the institute. 

In the meantime, please join us in congratulating all the members of the project team:

  • Rachael G. Samberg (University of California, Berkeley) (Project Director)
  • Scott Althaus (University of Illinois, Urbana-Champaign)
  • David Bamman (University of California, Berkeley)
  • Sara Benson (University of Illinois, Urbana-Champaign)
  • Brandon Butler (University of Virginia)
  • Beth Cate (Indiana University, Bloomington)
  • Kyle K. Courtney (Harvard University)
  • Maria Gould (California Digital Library)
  • Cody Hennesy (University of Minnesota, Twin Cities)
  • Eleanor Koehl (University of Michigan)
  • Thomas Padilla (University of Nevada, Las Vegas; OCLC Research)
  • Stacy Reardon (University of California, Berkeley)
  • Matthew Sag (Loyola University Chicago)
  • Brianna Schofield (Authors Alliance)
  • Megan Senseney (University of Arizona)
  • Glen Worthey (Stanford University)

German

The Languages of Berkeley: An Online Exhibition

German
Faust, ein Fragment (1790), Deutsches Textarchiv

Although based on a legend transmitted through the popular literature and drama of German-speaking Europe from the late 16th century onward (and which found an English-speaking audience through translation of the texts and Christopher Marlowe’s dramatic adaptation), Goethe’s own version of Faust lives at the heart of the German literary canon. The play’s “pact with the Devil” narrative tells the story of Dr. Faust, who, seeking deeper knowledge than the academy can provide, strikes a bargain with Mephistopheles which requires him to serve Faust and to show him all of the truths in the world. However, should Faust ever become complacent, his life would be forfeit. A series of fantastic, and tragic, events follows, and in the end Faust finds that his life is at risk. 

Goethe calls upon a variety of meters to tell his tale, which combines elements of contemporary European society with classical themes. He worked on the play intermittently over the course of nearly 50 years beginning in the 1770s (from which a copied manuscript survives), and after releasing his early efforts as Faust, ein Fragment in 1790, decided that the full play should be published as two parts: Part I, published in 1808, and Part II, published posthumously in 1832.  Goethe’s Faust would become highly influential, inspiring music, theater, opera, film, and literature (including Bulgakov’s The Master and Margarita) from the 19th century to the present. UC Berkeley Library owns numerous editions of the text, including the initial 1790 publication which was included in a multi-volume set of Goethe’s collected works and is housed in The Bancroft Library. A new project funded by the German Research Foundation called Faustedition has made Faust even more accessible by putting the full text online, and allowing line-by-line reading of variations across editions. Importantly, the project also includes an online archive of Goethe’s handwritten papers and letters, transcribed and searchable, which are related to the development of Faust.

The German language and its literature have been a fixture at Berkeley since the university’s founding. Today, the German Department offers courses at all levels and encompassing the breadth of the Middle Ages to the 21st century. In addition to Modern German, earlier forms of the language including Old Saxon, Old High German, Middle High German, and Early New High German are all taught. Goethe’s writings continue to be studied and read extensively. 

Contribution by Jeremy Ott
Classics and Germanic Studies Librarian, Doe Library

Title: Faust
Title in English: Faust
Author: Goethe, Johann Wolfgang von, 1749-1832.
Imprint: Leipzig: Christian Friedrich Solbrig, 1790.
Edition: 1st [?]
Language: German
Language Family: Indo-European, Germanic
Source: Deutsche Forschungsgemeinschaft (DFG) | German  Research Foundation
URL: http://faustedition.net

Other online editions:

Print editions at Berkeley:

The Languages of Berkeley [fan]
previous | about | next

The Languages of Berkeley is a dynamic online sequential exhibition celebrating the diversity of languages that have advanced research, teaching and learning at the University of California, Berkeley. It is made possible with support from the UC Berkeley Library and is co-sponsored by the Berkeley Language Center (BLC).

Follow The Languages of Berkeley!
Subscribe by email
Contact/Feedback
www.ucblib.link/languages

What’s your favorite language?


Workshop: Publish Digital Books & Open Educational Resources with Pressbooks

Digital Publishing Workshop Series

Publish Digital Books & Open Educational Resources with Pressbooks
Monday, May 6, 11:10am-12:30pm
Academic Innovation Studio, Dwinelle Hall 117 (Level D)

If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, MOBI, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way! Register at bit.ly/dp-berk