Workshop Date/Time: Wednesday, March 8, 2023, 11:00am–12:30pm
If you are working on a computational text analysis project and have wondered how to legally acquire, use, and publish text and data, this workshop is for you! We will teach you 5 legal literacies (copyright, contracts, privacy, ethics, and special use cases) that will empower you to make well-informed decisions about compiling, using, and sharing your corpus. By the end of this workshop, and with a useful checklist in hand, you will be able to confidently design lawful text analysis projects or be well positioned to help others design such projects. Consider taking alongside Copyright and Fair Use for Digital Projects.
Please sign up today and join us online on March 8.
We are excited to announce that the National Endowment for the Humanities (NEH) has awarded nearly $50,000 to UC Berkeley Library and Internet Archive to study legal and ethical issues in cross-border text data mining. The funding was made possible through NEH’s Digital Humanities Advancement Grant program.
NEH funding for the project, entitled Legal Literacies for Text Data Mining – Cross Border (“LLTDM-X”), will support research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or -licensed content, or involves international research collaborations.
LLTDM-X builds upon the highly successful Building Legal Literacies for Text Data Mining Institute (Building LLTDM), previously funded by the NEH in 2019. UC Berkeley Library directed Building LLTDM in June 2020, bringing together expert faculty from across the country to train 32 digital humanities researchers on how to navigate law, policy, ethics, and risk within text data mining projects. (All of the results and impacts are summarized in the white paper here.)
In Building LLTDM’s instructional sessions and post-workshop evaluations, participants identified cross-border research collaborations as an ongoing and critical legal and policy problem, and they also noted that foreign law and ethics issues pervaded their research. UC Berkeley Library’s Office of Scholarly Communication Services partnered with Internet Archive to begin to address these essential needs, and LLTDM-X sprung to life.
Why is LLTDM-X needed?
Text data mining, or TDM, is an increasingly essential and widespread research approach. TDM relies on automated techniques and algorithms to extract revelatory information from large sets of unstructured or thinly-structured digital content. These methodologies allow scholars to identify and analyze critical social, scientific, and literary patterns, trends, and relationships across volumes of data that would otherwise be impossible to sift through.
While TDM methodologies offer great potential, they also present scholars with nettlesome law and policy challenges that can prevent them from understanding how to move forward with their research. Building LLTDM trained TDM researchers and professionals on essential principles of copyright, licensing, and privacy law, as well as ethics—thereby helping them move forward with impactful digital humanities research.
As Building LLTDM revealed, United States digital humanities scholars do not conduct text data mining research only in or about the U.S. Further, digital humanities research in particular is marked by collaboration across institutions and geographical boundaries. Yet, U.S. practitioners encounter expanding and increasingly complex cross-border problems.
For example, U.S. contract law may supersede rights under copyright, such that a U.S. database license agreement may prohibit text data mining and other fair uses, whereas UK licenses cannot. Therefore U.S. TDM practitioners collaborating with UK-based colleagues face impactful choices about which agreements to apply, as this may determine whether text data mining is permitted. In the U.S., “breaking” technological protection measures to conduct text data mining is now authorized within certain parameters, yet other jurisdictions prohibit such work or apply different conditions. U.S. text data mining researchers must accordingly consider how they work with internationally-held or -licensed materials or collaborators.
There are at least three such “cross-border” TDM scenarios that scholars must parse, including: (i) if the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) if the human subjects they are studying or who created the underlying content reside in another country; or, (iii) if the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply. These may collectively be considered the “cross-border” TDM scenarios.
U.S. researchers are uncertain about how to navigate each of these scenarios. As evidenced in an informal survey that we conducted with digital humanities scholars, 70% of respondents reported cross-border copyright questions, 72% reported uncertainty about cross-border licensing terms, 52% noted privacy issues, and 48% identified ethical concerns. This confusion greatly impacted their TDM research. Twenty-eight percent (28%) of respondents confirmed that these cross-border copyright, licensing, privacy, or ethical issues impeded or prevented their project entirely. Of equal concern is that 40% of responding practitioners reported hesitation to share their workflows, methodology, or sources because of possible cross-border LLTDM issues. Without transparency, findings are deemed unreliable and scholarship may be rejected for publication. These problems will only mount given the increasing collaborativeness of research and the substantial amount of cross-border research occurring.
How will LLTDM-X help the world?
Our long-term goal is to design instructional materials and institutes to support digital humanities TDM scholars facing cross-border issues, but our first step with LLTDM-X is getting a better handle on the specific law and policy challenges they face.
Through a series of virtual roundtable discussions, and accompanying legal research and analysis, LLTDM-X will surface these cross-border issues and begin to distill preliminary guidance to help scholars in navigating them.
The first roundtable will engage U.S. digital humanities text data mining practitioners in sharing their cross-border TDM experiences. U.S. and global law and ethics experts will help guide the roundtable discussion to elicit the contours of practitioner experiences. During two subsequent roundtables—one focusing on cross-border copyright and licensing, and another on cross-border privacy and ethics—the experts will discuss practitioners’ hurdles in depth, and begin to develop customized guidance.
After the roundtables, we will work with the law and ethics experts to create instructive case studies that reflect the types of cross-border TDM issues practitioners encountered. These case studies will incorporate recommendations to help a broad audience of U.S. digital humanities text data mining practitioners navigate LLTDM-X concerns. Case studies, guidance, and recommendations will be widely-disseminated via an open access report to be published at the completion of the project. And most importantly, they will be used to inform our future educational offerings.
An experienced team
The team for LLTDM-X (introduced below) is eager to get started. The project is co-directed by Thomas Padilla, Deputy Director, Archiving and Data Services at Internet Archive.
“LLTDM-X responds strategically to a pervasive challenge that needlessly complicates, inhibits, and weakens the fullest potential of research. This work paves a critical path toward building future training institutes that address cross-border legal issues in TDM. At Internet Archive we’re committed to supporting universal access to all knowledge—LLTDM-X couldn’t be more clearly aligned with what we hope to achieve. We look forward to working with our partners at UC Berkeley Library and the wider community to advance this work.”
Rachael Samberg, who leads UC Berkeley Library’s Office of Scholarly Communication Services and oversaw Building LLTDM, joins Thomas as co-director and explains that:
“We are ready to begin analyzing and sorting out the complex legal challenges for digital humanities TDM researchers. We’ve already secured an incredible group of international legal and ethics experts to conduct the analyses, and will share more on that soon. In the meantime, we are gearing up to build out an even larger group of participating scholars whose experiences will help us create case studies.”
On behalf of the entire project team, we would like to thank NEH’s Office of Digital Humanities again for funding this important work. We invite you to contact us with any questions you may have.
Thomas Padilla (Project Director): Thomas is Deputy Director, Archiving and Data Services at Internet Archive, and has deep experience cultivating library, archive, and museum ability to support TDM research. He has previously served as Principal Investigator of the Andrew W. Mellon supported Collections as Data: Part to Whole, the Institute of Museum and Library Services supported, Always Already Computational: Collections as Data, and as author of the library community research agenda, Responsible Operations: Data Science, Machine Learning, and AI in Libraries. In addition, Padilla was an expert faculty for Building LLTDM, the precursor to LLTDM-X.
Rachael Samberg (Project Co-Director): Rachael is Scholarly Communication Officer & Program Director of the University of California, Berkeley Library’s Office of Scholarly Communication Services. She served as Project Director and legal expert for Building LLTDM. A Duke Law graduate, Rachael practiced intellectual property litigation at Fenwick & West LLP for seven years before spending six years at Stanford Law School’s library, where she was Head of Reference & Instructional Services and a Lecturer in Law. Rachael speaks throughout the country about copyright and TDM issues, about which she is widely published. Her chapter, Law & Literacy in Non-Consumptive Text Mining, was published in Copyright Conversations (ALA, 2019).
Stacy Reardon (Project Team Member): Stacy Reardon is Literatures and Digital Humanities Librarian at the University of California, Berkeley Library, where she provides guidance and instruction on digital humanities projects and methods. Stacy served as a library expert on the Project Team for the NEH-funded Building Legal Literacies for Text Data Mining. She is co-chair of the UC Berkeley’s Digital Humanities Working Group, and received her Ph.D. in literature from the University of Massachusetts, Amherst.
Timothy Vollmer (Project Manager): Timothy Vollmer is Scholarly Communication and Copyright Librarian at UC Berkeley Library. He served as Project Manager for the NEH-funded Building Legal Literacies for Text Data Mining. Tim worked as a senior public policy manager for Creative Commons, and contributed to writing and advocacy on the text data mining exceptions in the EU’s Directive on Copyright in the Digital Single Market. He formerly was the Assistant Director to the Program on Public Access to Information at the American Library Association.
It’s 2022, and we’re right back at it with supporting your scholarship and publishing. This Spring, the Office of Scholarly Communication Services has some practical workshops for you as part of the Library’s Digital Publishing Series. Here’s what’s coming up over the next few months.
Publish Digital Books and Open Educational Resources with Pressbooks
February 8, 2022
Online: Register to receive the Zoom link
If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way! Signup at the link above and the Zoom login details will be emailed to you.
Can I Mine That? Should I Mine That?: A Clinic for Copyright, Ethics & More in TDM Research
March 9, 2022
Online: Register to receive the Zoom link
If you are working on a computational text analysis project and have wondered how to legally acquire, use, and publish text and data, this workshop is for you! We will teach you 5 legal literacies (copyright, contracts, privacy, ethics, and special use cases) that will empower you to make well-informed decisions about compiling, using, and sharing your corpus. By the end of this workshop, and with a useful checklist in hand, you will be able to confidently design lawful text analysis projects or be well positioned to help others design such projects. Signup at the link above and the Zoom login details will be emailed to you.
Other ways we can help
In addition to the workshops, we’re here to help answer a variety of questions you might have on intellectual property, digital publishing, and information policy.
- Check out our website for information on issues such as copyright and fair use, text data mining, and UC’s Open Access Policy.
- Interested in publishing your research Open Access? UCB Library can help defray the costs of an article processing charge (up to $2,500) or book processing charge (up to $10,000). See the Berkeley Research Impact Initiative (BRII) for more information. And explore the various UC-wide transformative open access agreements that can help UC corresponding authors publish their scholarship open access.
- Do you want to create an open digital textbook? Take a look at UC Berkeley’s Open Book Publishing platform (anyone with a @berkeley.edu email can sign up for a free account), and get in touch with us about our Open Educational Resources (OER) grant program.
- Keep an eye on our events calendar for more workshops and trainings.
- Follow our blog, social media, and YouTube channel.
Want help or more information? Send us an email. We can provide individualized support and personal consultations, online class instruction, presentations and workshops for small or large groups & classes, and customized support and training for departments and disciplines.
Last summer we hosted the Building Legal Literacies for Text Data Mining institute. We welcomed 32 digital humanities researchers and professionals to the weeklong virtual training, with the goal to empower them to confidently navigate law, policy, ethics, and risk within digital humanities text data mining (TDM) projects. Building Legal Literacies for Text Data Mining (Building LLTDM) was made possible through a grant from the National Endowment for the Humanities.
Since the remote institute in June 2020, the participants and project team reconvened in February 2021 to discuss how participants had been thinking about, performing, or supporting TDM in their home institutions and projects with the law and policy literacies in mind.
To maximize the reach and impact of Building LLTDM, we have now published a comprehensive open educational resource (OER) of the contents of the institute. The OER covers copyright (both U.S. and international law), technological protection measures, privacy, and ethical considerations. It also helps other digital humanities professionals and researchers run their own similar institutes by describing in detail how we developed and delivered programming (including our pedagogical reflections and take-aways), and includes ideas for hosting shorter literacy teaching sessions. The resource (available as a web-book or in downloadable formats including PDF and EPUB) is in the public domain under the CC0 Public Domain Dedication, meaning it can be accessed, reused, and repurposed without restriction.
In addition to the OER, we’ve also published a white paper that describes the institute’s origins and goals, project overview and activities, and reflections and possible follow-on actions.
Thank you to the National Endowment for the Humanities, the project team, institute participants, and staff at the UC Berkeley Library for making Building LLTDM a success.
[Note: this content is cross-posted on the LLTDM blog.]