Tag: scholarly communication
UC Berkeley Library to Copyright Office: Protect fair uses in AI training for research and education

We are pleased to share the UC Berkeley Library’s response to the U.S. Copyright Office’s Notice of Inquiry regarding artificial intelligence and copyright. Our response addresses the essential fair use right relied upon by UC Berkeley scholars in undertaking groundbreaking research, and the need to preserve access to the underlying copyright-protected content so that scholars using AI systems can conduct research inquiries.
In this blog post, we explain what the Copyright Office is studying, and why it was important for the Library to make scholars’ voices heard.
What the Copyright Office is studying and why
Loosely speaking, the Copyright Office wants to understand how to set policy for copyright issues raised by artificial intelligence (“AI”) systems.
Over the last year, AI systems and the rapid growth of their capabilities have attracted significant attention. One type of AI, referred to as “generative AI”, is capable of producing outputs such as text, images, video, or audio (including emulating a human voice) that would be considered copyrightable if created by a human author. These systems include, for instance, the chatbot ChatGPT, and text-to-image generators like DALL·E, Midjourney, and Stable Diffusion. A user can prompt ChatGPT to write a short story that features a duck and a frog who are best friends, or prompt DALL·E to create an abstract image in the style of a Jackson Pollock painting. Generative AI systems are relevant to and impact many educational activities on a campus like UC Berkeley, but (at least to date) have not been the key facilitator of campus research methodologies.
Instead, in the context of research, scholars have been relying on AI systems to support a set of research methodologies referred to as “text and data mining” (or TDM). TDM utilizes computational tools, algorithms, and automated techniques to extract revelatory information from large sets of unstructured or thinly-structured digital content. Imagine you have a book like “Pride and Prejudice.” There are nearly infinite volumes of information stored inside that book, depending on your scholarly inquiry, such as how many female vs. male characters there are, what types of words the female characters use as opposed to the male characters, what types of behaviors the female characters display relative to the males, etc. TDM allows researchers to identify and analyze patterns, trends, and relationships across volumes of data that would otherwise be impossible to sift through on a close examination of one book or item at a time.
Not all TDM research methodologies necessitate the usage of AI systems to extract this information. For instance, as in the “Pride and Prejudice” example above, sometimes TDM can be performed by developing algorithms to detect the frequency of certain words within a corpus, or to parse sentiments based on the proximity of various words to each other. In other cases, though, scholars must employ machine learning techniques to train AI models before the models can make a variety of assessments.
Here is an illustration of the distinction: Imagine a scholar wishes to assess the prevalence with which 20th century fiction authors write about notions of happiness. The scholar likely would compile a corpus of thousands or tens of thousands of works of fiction, and then run a search algorithm across the corpus to detect the occurrence or frequency of words like “happiness,” “joy,” “mirth,” “contentment,” and synonyms and variations thereof. But if a scholar instead wanted to establish the presence of fictional characters who embody or display characteristics of being happy, the scholar would need to employ discriminative modeling (a classification and regression technique) that can train AI to recognize the appearance of happiness by looking for recurring indicia of character psychology, behavior, attitude, conversational tone, demeanor, appearance, and more. This is not using a generative AI system to create new outputs, but rather training a non-generative AI system to predict or detect existing content. And to undertake this type of non-generative AI training, a scholar would need to use a large volume of often copyright-protected works.
The Copyright Office is studying both of these kinds of AI systems—that is, both generative AI and non-generative AI. They are asking a variety of questions in response to having been contacted by stakeholders across sectors and industries with diverse views about how AI systems should be regulated. Some of the concerns expressed by stakeholders include:
- Who is the “author” of generative AI outputs?
- Should people whose voices or images are used to train generative AI systems have a say in how their voices or images are used?
- Should the creator of an AI system (whether generative or non-generative) need permission from copyright holders to use copyright-protected materials in training the AI to predict and detect things?
- Should copyright owners get to opt out of having their content used to train AI? Should ethics be considered within copyright regulation?
Several of these questions are already the subject of pending litigation. While these questions are being explored by the courts, the Copyright Office wants to understand the entire landscape better as it considers what kinds of AI copyright regulations to enact.
The copyright law and policy landscape underpinning the use of AI models is complex, and whatever regulatory decisions that the Copyright Office makes will bear ramifications for global enterprise, innovation, and trade. The Copyright Office’s inquiry thus raises significant and timely legal questions, many of which we are only beginning to understand.
For these reasons, the Library has taken a cautious and narrow approach in its response to the inquiry: we address only two key principles known about fair use and licensing, as these issues bear upon the nonprofit education, research, and scholarship undertaken by scholars who rely on (typically non-generative) AI models. In brief, the Library wants to ensure that (1) scholars’ voices, and that of the academic libraries who support them, are heard to preserve fair use in training AI, and that (2) copyright-protected content remains available for AI training to support nonprofit education and research.
Why the study matters for fair use
Previous court cases like Authors Guild v. HathiTrust, Authors Guild v. Google, and A.V. ex rel. Vanderhye v. iParadigms have addressed fair use in the context of TDM and determined that the reproduction of copyrighted works to create and text mine a collection of copyright-protected works is a fair use. These cases further hold that making derived data, results, abstractions, metadata, or analysis from the copyright-protected corpus available to the public is also fair use, as long as the research methodologies or data distribution processes do not re-express the underlying works to the public in a way that could supplant the market for the originals. Performing all of this work is essential for TDM-reliant research studies.
For the same reasons that the TDM process is fair use of copyrighted works, the training of AI tools to do that TDM should also be fair use, in large part because training does not reproduce or communicate the underlying copyrighted works to the public. Here, there is an important distinction to make between training inputs and outputs, in that the overall fair use of generative AI outputs cannot always be predicted in advance: The mechanics of generative models’ operations suggest that there are limited instances in which generative AI outputs could indeed be substantially similar to (and potentially infringing of) the underlying works used for training; this substantial similarity is possible typically only when a training corpus is rife with numerous copies of the same work. However, the training of AI models by using copyright-protected inputs falls squarely within what courts have determined to be a transformative fair use, especially when that training is for nonprofit educational or research purposes. And it is essential to protect the fair use rights of scholars and researchers to make these uses of copyright-protected works when training AI.
Further, were these fair use rights overridden by limiting AI training access to only “safe” materials (like public domain works or works for which training permission has been granted via license), this would exacerbate bias in the nature of research questions able to be studied and the methodologies available to study them, and amplify the views of an unrepresentative set of creators given the limited types of materials available with which to conduct the studies.
Why access to AI training content should be preserved
For the same reasons, it is important that scholars’ ability to access the underlying content to conduct AI training be preserved. The fair use provision of the Copyright Act does not afford copyright owners a right to opt out of allowing other people to use their works for good reason: if content creators were able to opt out, the provision for fair use would be undermined, and little content would be available to build upon for the advancement of science and the useful arts. Accordingly, to the extent that the Copyright Office is considering creating a regulatory right for creators to opt out of having their works included in AI training, it is paramount that such opt-out provision not be extended to any AI training or activities that constitute fair use, particularly in the nonprofit educational and research contexts.
AI training opt-outs would be a particular threat for research and education because fair use in these contexts is already becoming an out-of-reach luxury even for the wealthiest institutions. Academic libraries are forced to pay significant sums each year to try to preserve fair use rights for campus scholars through the database and electronic content license agreements that libraries sign. In the U.S., the prospect of “contractual override” means that, although fair use is statutorily provided for, private parties (like publishers) may “contract around” fair use by requiring libraries to negotiate for otherwise lawful activities (such as conducting TDM or training AI for research), and often to pay additional fees for the right to conduct these lawful activities on top of the cost of licensing the content, itself. When such costs are beyond institutional reach, the publisher or vendor may then offer similar contractual terms directly to research teams, who may feel obliged to agree in order to get access to the content they need. Vendors may charge tens or even hundreds of thousands of dollars for this type of access.
This “pay-to-play” landscape of charging institutions for the opportunity to rely on existing statutory rights is particularly detrimental for TDM research methodologies, because TDM research often requires use of massive datasets with works from many publishers, including copyright owners that cannot be identified or who are unwilling to grant licenses. If the Copyright Office were to enable rightsholders to opt-out of having their works fairly used for training AI, then academic institutions and scholars would face even greater hurdles in licensing content for research purposes.
First, it would be operationally difficult for academic publishers and content aggregators to amass and license the “leftover” body of copyrighted works that remain eligible for AI training. Costs associated with publishers’ efforts in compiling “AI-training-eligible” content would be passed along as additional fees charged to academic libraries. In addition, rightsholders might opt out of allowing their work to be used for AI training fair uses, and then turn around and charge AI usage fees to scholars (or libraries)—essentially licensing back fair uses for research. These scenarios would impede scholarship by or for research teams who lack grant or institutional funds to cover these additional expenses; penalize research in or about underfunded disciplines or geographical regions; and result in bias as to the topics and regions studied.
Scholars need to be able to utilize existing knowledge resources to create new knowledge goods. Congress and the Copyright Office clearly understand the importance of facilitating access and usage rights, having implemented the statutory fair use provision without any exclusions or opt-outs. This status quo should be preserved for fair use AI training—and particularly in the nonprofit educational or research contexts.
Our office is here to help
No matter what happens with the Copyright Office’s inquiry and any regulations that ultimately may be established, the UCB Library’s Office of Scholarly Communication Services is here to help you. We are a team of copyright law and information policy (licensing, privacy, and ethics) experts who help UC Berkeley scholars navigate legal, ethical, and policy considerations in utilizing resources in their research and teaching. And we are national and international leaders in supporting TDM research—offering online tools, trainings, and individual consultations to support your scholarship. Please feel free to reach out to us with any questions at schol-comm@berkeley.edu.
Workshop Reminder — How to Publish Open Access at UC Berkeley on October 17, 2023

Date/Time: Tuesday, October 17, 2023, 11:00am–12:30pm
Location: Zoom only. Register via LibCal.
Are you wondering what processes, platforms, and funding are available at UC Berkeley to publish your research open access (OA)? This workshop will provide practical guidance and walk you through all of the OA publishing options and funding sources you have on campus. We’ll explain: the difference between (and mechanisms for) self-depositing your research in the UC’s institutional repository vs. choosing publisher-provided OA; what funding is available to put toward your article or book charges if you choose a publisher-provided option; and the difference between funding coverage under the UC’s systemwide OA agreements vs. the Library’s funding program (Berkeley Research Impact Initiative). We’ll also give you practical tips and tricks to maximize your retention of rights and readership in the publishing process.
Join us next week!
Workshop Reminder — Managing & Maximizing Your Scholarly Impact on October 10, 2023

Date/Time: Tuesday, October 10, 2023, 11:00am–12:30pm
Location: Hybrid: Join in person at 223 Doe Library, or on Zoom. Register via LibCal.
This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.
Join us next week!
UC Berkeley author tips: What to do when you have to pay an open access publishing fee
This post provides information to UC Berkeley authors about programs that our Library and the UC system offer to help defray open access article processing charges. It also offers tips about how to plan or budget in advance for these fees when possible.
The University of California has been a long-time supporter of open access publishing—that is, making peer-reviewed scholarship available online without any financial, legal, or technical barriers. Just because the publishing outcome is open to be read at no cost, though, doesn’t mean the publishing enterprise as a whole is “free.” One of the most common ways for open access publishers to continue to finance their publishing and production of journals in the absence of selling subscriptions for access is to instead charge authors a fee to publish—moving from a publishing system based on paying to read to one based on paying to publish. Of course, not all methods of funding open access require authors to pay publication fees in this way. And in all cases (except those rare instances in which a publisher requests that you waive this right), the UC’s open access policy makes it possible for UC authors to share their author-accepted manuscript version of their articles on eScholarship, the UC’s research repository, immediately upon publication in a journal.
But when a publisher does charge a fee to publish, we want to help you understand what UC Berkeley resources are available—whether from your grant funds or the University of California Libraries—to help with those costs.
Typically publishers refer to author-facing fees as “article processing charges”, or “APCs”. APCs can range from a few hundred dollars all the way up to $10,000 or more for some select Nature journals.
UC authors may be able to cover or contribute to these fees by leveraging research accounts or grant funds (to the extent available). But there are also other University Library programs available to support payment when research accounts or grant funds are not available.
UC-wide open access publishing agreements will cover some (or all) APCs
UC corresponding authors can take advantage of funding opportunities to defray the cost of publishing their scholarship open access where their grant or other research funds come up short or are not available. The University of California libraries have entered into a growing number of systemwide transformative open access agreements with publishers. UC libraries’ transformative agreements aim to transform scholarly publishing by moving from a publication model based on subscription access to an open access model.
When a UC-affiliated corresponding author has an article accepted for publication in a journal with which the UC has an open access publishing agreement, the UC libraries will pay some or all of the associated publishing fee. So, when it comes time to pay the APC, the UC libraries will pay at least the first $1,000. If there’s any remaining balance due on the APC, the publisher’s payment system will ask if the UC author has grant funding available to cover the remainder. If the UC author cannot contribute the remaining balance, the UC libraries will pay the entire APC on their behalf. (Note: there are a few instances where the UC libraries will contribute a maximum of $1,000 toward the APC, such as Nature-branded titles.)
The UC maintains an updated list of Publisher OA Agreements and Discounts where you can explore which journals are available for partial or full APC coverage under the open access agreements.
The UC Berkeley Library-specific fund can reimburse open access fees for other fully open access journals
UC Berkeley’s Library also has a campus open access fund that UCB authors can use if they are publishing in a fully open access journal and are required to pay an APC. The Berkeley Research Impact Initiative (BRII) is open to any current UC Berkeley faculty, graduate student, postdoc, or academic staff who does not have other sources of funds to pay article processing charges. The BRII fund is available for journals other than those with which the UC has entered into a systemwide transformative open access agreement.
For BRII APC coverage to apply, the entire journal must be freely available to the public without subscription fees. BRII cannot cover fees for publishing in “hybrid” OA journals—which are subscription-based journals that only offer open access options if an author decides to pay an additional fee to make their individual article open access. BRII reimbursements are capped at $2,500 per article, and a UC Berkeley author can use BRII funds once per fiscal year.
How to plan in advance
If your research is grant funded, it is important to think about publishing costs at the beginning of your research cycle and account for them in your grant applications and annual research budgeting. For grant recipients (such as researchers with funding from NIH, NSF, etc.), open access publishing costs generally are considered an allowable direct expense unless funders explicitly prohibit them. For more information on how and why to plan in advance, check out the Open Access Fact Sheet for Researchers Applying for Grants.

Planning in advance allows you to be a partner in the publishing process. It allows the UC libraries to cover some of your article processing charge ($1,000) and, where possible, you to use grant or research funds to cover the rest. The more researchers are able to contribute, the farther the UC agreements can go in publishing more articles open access, and the better UC libraries are able to help provide financial support to researchers who do not have specific access to grant funds.
Most of the UC transformative open access agreements are set up to cover the full article processing charge should UC authors not have research or grant funds to contribute to making their journal articles open access. But there are a few journal titles and series within transformative agreements for which the libraries were unable to negotiate full coverage. For example, if a UC author has an article accepted in Nature Communications, the UC libraries cover only the first $1,000 of the article processing charge through the terms of the UC-Springer Nature transformative open access agreement. Since the current APC for Nature Communications is $6,290, then the UC author must pay the remainder of the fee ($5,290).
Another instance in which an author may need to pay a balance is when the author is publishing in a fully-open access journal not covered by a transformative agreement at all, and in turn when that journal’s article processing charge exceeds what can be covered through the BRII program. For instance, if a UC Berkeley author has an article accepted for publication in JAMA Network Open, the BRII program is capped at covering $2,500 of the article processing charge. Since the APC for JAMA Network Open is $3,000, then the UC Berkeley author must pay the remainder of the fee ($500).
Since both of the examples above are journals in which an APC is required in order to publish there, authors are responsible for securing the remainder of any publishing fees should the open access publication costs exceed the amount of UC libraries (or UC Berkeley Library’s) support.
Need more help?
- Contact the Library’s Office of Scholarly Communication Services at schol-comm@berkeley.edu
- Read the Open Access Fact Sheet for Researchers Applying for Grants
- Explore the Open Access at UC webpage
- Explore the Open Access at Berkeley webpage
- Watch the YouTube workshop video How to Publish Open Access at UC Berkeley
- Visit the Berkeley Research Impact Initiative (BRII) website
AOQU (Achilles Orlando Quixote Ulysses). Rivista di epica
Fall 2022 copyright and publishing workshops with the Office of Scholarly Communication Services
With the school year kicking off this week in Berkeley, the Office of Scholarly Communication Services is here to help UC Berkeley faculty, students, and staff understand copyright and scholarly publishing with online resources, Zoom workshops, and consultations.
Here’s what’s coming up this semester.
Workshops
Publish Digital Books & Open Educational Resources with Pressbooks
Date/Time: Tuesday, September 20, 2022, 11:00am–12:30pm
RSVP for Zoom link
If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way.
Copyright and Your Dissertation
Date/Time: Tuesday, September 27, 2022, 11:00am–12:30pm
RSVP for Zoom link
This workshop will provide you with practical guidance for navigating copyright questions and other legal considerations for your dissertation or thesis. Whether you’re just starting to write or you’re getting ready to file, you can use our tips and workflow to figure out what you can use, what rights you have as an author, and what it means to share your dissertation online.
Managing and Maximizing Your Scholarly Impact
Date/Time: Tuesday, October 11, 2022, 11:00am–12:30pm
RSVP for Zoom link
This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.
From Dissertation to Book: Navigating the Publication Process
Date/Time: Tuesday, October 18, 2022, 11:00am–12:30pm
RSVP for Zoom link
Hear from a panel of experts—an acquisitions editor, a first-time book author, and an author rights expert—about the process of turning your dissertation into a book. You’ll come away from this panel discussion with practical advice about revising your dissertation, writing a book proposal, approaching editors, signing your first contract, and navigating the peer review and publication process.
How to Publish Open Access at UC Berkeley
Date/Time: Tuesday, October 25, 2022, 11:00am–12:30pm
RSVP for Zoom link
Are you wondering what processes, platforms, and funding are available at UC Berkeley to publish your research open access (OA)? This workshop will provide practical guidance and walk you through all of the OA publishing options and funding sources you have on campus. We’ll explain: the difference between (and mechanisms for) self-depositing your research in the UC’s institutional repository vs. choosing publisher-provided OA; what funding is available to put toward your article or book charges if you choose a publisher-provided option; and the difference between funding coverage under the UC’s “transformative agreements” vs. the Library’s funding program (Berkeley Research Impact Initiative). We’ll also give you practical tips and tricks to maximize your retention of rights and readership in the publishing process.
Copyright and Fair Use for Digital Projects
Date/Time: Tuesday, November 8, 2022, 11:00am–12:30pm
RSVP for Zoom link
This training will help you navigate the copyright, fair use, and usage rights of including third-party content in your digital project. Whether you seek to embed video from other sources for analysis, post material you scanned from a visit to the archives, add images, upload documents, or more, understanding the basics of copyright and discovering a workflow for answering copyright-related digital scholarship questions will make you more confident in your project. We will also provide an overview of your intellectual property rights as a creator and ways to license your own work.
Other ways we can help
In addition to the workshops, we’re here to help answer a variety of questions you might have on intellectual property, digital publishing, and information policy.
- Check out our website for information on issues such as copyright and fair use, text data mining, and how to participate in UC’s Open Access Policy.
- Interested in publishing your research Open Access? UCB Library can help defray the costs of an article processing charge (up to $2,500) or book processing charge (up to $10,000). See the Berkeley Research Impact Initiative (BRII) for more information. And explore the various UC-wide transformative open access agreements and discounts that can help UC corresponding authors publish their scholarship open access.
- Do you want to create an open digital textbook? Take a look at UC Berkeley’s Open Book Publishing platform (anyone with a @berkeley.edu email can sign up for a free account), and get in touch with us about our Open Educational Resources (OER) grant program.
- Keep an eye on our events calendar for more workshops and trainings.
- Follow our blog, social media, and YouTube channel.
Want help or more information? Send us an email. We can provide individualized support and personal consultations, online class instruction, presentations and workshops for small or large groups & classes, and customized support and training for departments and disciplines.
UC Berkeley Library and Internet Archive co-directing project to help text data mining researchers navigate cross-border legal and ethical issues
We are excited to announce that the National Endowment for the Humanities (NEH) has awarded nearly $50,000 to UC Berkeley Library and Internet Archive to study legal and ethical issues in cross-border text data mining. The funding was made possible through NEH’s Digital Humanities Advancement Grant program.
NEH funding for the project, entitled Legal Literacies for Text Data Mining – Cross Border (“LLTDM-X”), will support research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or -licensed content, or involves international research collaborations.
LLTDM-X builds upon the highly successful Building Legal Literacies for Text Data Mining Institute (Building LLTDM), previously funded by the NEH in 2019. UC Berkeley Library directed Building LLTDM in June 2020, bringing together expert faculty from across the country to train 32 digital humanities researchers on how to navigate law, policy, ethics, and risk within text data mining projects. (All of the results and impacts are summarized in the white paper here.)
In Building LLTDM’s instructional sessions and post-workshop evaluations, participants identified cross-border research collaborations as an ongoing and critical legal and policy problem, and they also noted that foreign law and ethics issues pervaded their research. UC Berkeley Library’s Office of Scholarly Communication Services partnered with Internet Archive to begin to address these essential needs, and LLTDM-X sprung to life.
Why is LLTDM-X needed?
Text data mining, or TDM, is an increasingly essential and widespread research approach. TDM relies on automated techniques and algorithms to extract revelatory information from large sets of unstructured or thinly-structured digital content. These methodologies allow scholars to identify and analyze critical social, scientific, and literary patterns, trends, and relationships across volumes of data that would otherwise be impossible to sift through.
While TDM methodologies offer great potential, they also present scholars with nettlesome law and policy challenges that can prevent them from understanding how to move forward with their research. Building LLTDM trained TDM researchers and professionals on essential principles of copyright, licensing, and privacy law, as well as ethics—thereby helping them move forward with impactful digital humanities research.
As Building LLTDM revealed, United States digital humanities scholars do not conduct text data mining research only in or about the U.S. Further, digital humanities research in particular is marked by collaboration across institutions and geographical boundaries. Yet, U.S. practitioners encounter expanding and increasingly complex cross-border problems.
For example, U.S. contract law may supersede rights under copyright, such that a U.S. database license agreement may prohibit text data mining and other fair uses, whereas UK licenses cannot. Therefore U.S. TDM practitioners collaborating with UK-based colleagues face impactful choices about which agreements to apply, as this may determine whether text data mining is permitted. In the U.S., “breaking” technological protection measures to conduct text data mining is now authorized within certain parameters, yet other jurisdictions prohibit such work or apply different conditions. U.S. text data mining researchers must accordingly consider how they work with internationally-held or -licensed materials or collaborators.
There are at least three such “cross-border” TDM scenarios that scholars must parse, including: (i) if the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) if the human subjects they are studying or who created the underlying content reside in another country; or, (iii) if the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply. These may collectively be considered the “cross-border” TDM scenarios.
U.S. researchers are uncertain about how to navigate each of these scenarios. As evidenced in an informal survey that we conducted with digital humanities scholars, 70% of respondents reported cross-border copyright questions, 72% reported uncertainty about cross-border licensing terms, 52% noted privacy issues, and 48% identified ethical concerns. This confusion greatly impacted their TDM research. Twenty-eight percent (28%) of respondents confirmed that these cross-border copyright, licensing, privacy, or ethical issues impeded or prevented their project entirely. Of equal concern is that 40% of responding practitioners reported hesitation to share their workflows, methodology, or sources because of possible cross-border LLTDM issues. Without transparency, findings are deemed unreliable and scholarship may be rejected for publication. These problems will only mount given the increasing collaborativeness of research and the substantial amount of cross-border research occurring.
How will LLTDM-X help the world?
Our long-term goal is to design instructional materials and institutes to support digital humanities TDM scholars facing cross-border issues, but our first step with LLTDM-X is getting a better handle on the specific law and policy challenges they face.
Through a series of virtual roundtable discussions, and accompanying legal research and analysis, LLTDM-X will surface these cross-border issues and begin to distill preliminary guidance to help scholars in navigating them.
The first roundtable will engage U.S. digital humanities text data mining practitioners in sharing their cross-border TDM experiences. U.S. and global law and ethics experts will help guide the roundtable discussion to elicit the contours of practitioner experiences. During two subsequent roundtables—one focusing on cross-border copyright and licensing, and another on cross-border privacy and ethics—the experts will discuss practitioners’ hurdles in depth, and begin to develop customized guidance.
After the roundtables, we will work with the law and ethics experts to create instructive case studies that reflect the types of cross-border TDM issues practitioners encountered. These case studies will incorporate recommendations to help a broad audience of U.S. digital humanities text data mining practitioners navigate LLTDM-X concerns. Case studies, guidance, and recommendations will be widely-disseminated via an open access report to be published at the completion of the project. And most importantly, they will be used to inform our future educational offerings.
An experienced team
The team for LLTDM-X (introduced below) is eager to get started. The project is co-directed by Thomas Padilla, Deputy Director, Archiving and Data Services at Internet Archive.
“LLTDM-X responds strategically to a pervasive challenge that needlessly complicates, inhibits, and weakens the fullest potential of research. This work paves a critical path toward building future training institutes that address cross-border legal issues in TDM. At Internet Archive we’re committed to supporting universal access to all knowledge—LLTDM-X couldn’t be more clearly aligned with what we hope to achieve. We look forward to working with our partners at UC Berkeley Library and the wider community to advance this work.”
Rachael Samberg, who leads UC Berkeley Library’s Office of Scholarly Communication Services and oversaw Building LLTDM, joins Thomas as co-director and explains that:
“We are ready to begin analyzing and sorting out the complex legal challenges for digital humanities TDM researchers. We’ve already secured an incredible group of international legal and ethics experts to conduct the analyses, and will share more on that soon. In the meantime, we are gearing up to build out an even larger group of participating scholars whose experiences will help us create case studies.”
On behalf of the entire project team, we would like to thank NEH’s Office of Digital Humanities again for funding this important work. We invite you to contact us with any questions you may have.
Thomas Padilla (Project Director): Thomas is Deputy Director, Archiving and Data Services at Internet Archive, and has deep experience cultivating library, archive, and museum ability to support TDM research. He has previously served as Principal Investigator of the Andrew W. Mellon supported Collections as Data: Part to Whole, the Institute of Museum and Library Services supported, Always Already Computational: Collections as Data, and as author of the library community research agenda, Responsible Operations: Data Science, Machine Learning, and AI in Libraries. In addition, Padilla was an expert faculty for Building LLTDM, the precursor to LLTDM-X.
Rachael Samberg (Project Co-Director): Rachael is Scholarly Communication Officer & Program Director of the University of California, Berkeley Library’s Office of Scholarly Communication Services. She served as Project Director and legal expert for Building LLTDM. A Duke Law graduate, Rachael practiced intellectual property litigation at Fenwick & West LLP for seven years before spending six years at Stanford Law School’s library, where she was Head of Reference & Instructional Services and a Lecturer in Law. Rachael speaks throughout the country about copyright and TDM issues, about which she is widely published. Her chapter, Law & Literacy in Non-Consumptive Text Mining, was published in Copyright Conversations (ALA, 2019).
Stacy Reardon (Project Team Member): Stacy Reardon is Literatures and Digital Humanities Librarian at the University of California, Berkeley Library, where she provides guidance and instruction on digital humanities projects and methods. Stacy served as a library expert on the Project Team for the NEH-funded Building Legal Literacies for Text Data Mining. She is co-chair of the UC Berkeley’s Digital Humanities Working Group, and received her Ph.D. in literature from the University of Massachusetts, Amherst.
Timothy Vollmer (Project Manager): Timothy Vollmer is Scholarly Communication and Copyright Librarian at UC Berkeley Library. He served as Project Manager for the NEH-funded Building Legal Literacies for Text Data Mining. Tim worked as a senior public policy manager for Creative Commons, and contributed to writing and advocacy on the text data mining exceptions in the EU’s Directive on Copyright in the Digital Single Market. He formerly was the Assistant Director to the Program on Public Access to Information at the American Library Association.
Back in action with your scholarship

As the school year restarts in Berkeley, we know the pandemic is not over. But the Office of Scholarly Communication Services is here to help UC Berkeley faculty, students, and staff understand copyright and scholarly publishing with online resources, Zoom workshops, and virtual consultations.
If you’re interested in a recap of our progress and achievement over the last year, check out our 2020-21 annual report.
Here’s what’s coming up this semester.
Upcoming Workshops
Publish Digital Books and Open Educational Resources with Pressbooks
September 14, 2021
11:00am–12:30pm
RSVP
If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way! Signup at the link below and the Zoom login details will be emailed to you.
Copyright and Your Dissertation
October 25, 2021
1:00pm–2:30pm
RSVP
This workshop will provide you with practical guidance for navigating copyright questions and other legal considerations for your dissertation or thesis. Whether you’re just starting to write or you’re getting ready to file, you can use our tips and workflow to figure out what you can use, what rights you have as an author, and what it means to share your dissertation online.
From Dissertation to Book: Navigating the Publication Process
October 26, 2021
1:00pm–2:30pm
RSVP
Hear from a panel of experts—an acquisitions editor, a first-time book author, and an author rights expert—about the process of turning your dissertation into a book. You’ll come away from this panel discussion with practical advice about revising your dissertation, writing a book proposal, approaching editors, signing your first contract, and navigating the peer review and publication process.
Managing and Maximizing Your Scholarly Impact
October 28, 2021
1:00pm–2:30pm
RSVP
This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.
Copyright and Fair Use for Digital Projects
November 10, 2021
11:00am–12:30pm
RSVP
This training will help you navigate the copyright, fair use, and usage rights of including third-party content in your digital project. Whether you seek to embed video from other sources for analysis, post material you scanned from a visit to the archives, add images, upload documents, or more, understanding the basics of copyright and discovering a workflow for answering copyright-related digital scholarship questions will make you more confident in your project. We will also provide an overview of your intellectual property rights as a creator and ways to license your own work.
Other ways we can help
We’re here to help answer a variety of questions you might have on intellectual property, digital publishing, and information policy.
- Check out our website for information on issues such as copyright and fair use, the scholarly publishing lifecycle, and UC’s Open Access Policy.
- Interested in publishing your research Open Access? UCB Library can help defray the costs of an article processing charge (up to $2,500) or book processing charge (up to $10,000). See the Berkeley Research Impact Initiative (BRII) for more information. There are also opportunities to publish open access via one of the UC’s Transformative Open Access Agreements.
- Do you want to create an open digital textbook? Take a look at UC Berkeley’s Open Book Publishing platform (anyone with a @berkeley.edu email can signup for a free account), and get in touch with us about our Open Educational Resources (OER) grant program. Also check out our recent blog post which highlights some recent open access books made possible via the Library.
- Keep an eye on our events calendar for more workshops and trainings.
- Follow our blog and social media.
Want help or more information? Send us an email at schol-comm@berkeley.edu. We can provide individualized support and personal consultations, online class instruction, and customized support and training for departments.
Now available: Open educational resource of Building Legal Literacies for Text Data Mining
Last summer we hosted the Building Legal Literacies for Text Data Mining institute. We welcomed 32 digital humanities researchers and professionals to the weeklong virtual training, with the goal to empower them to confidently navigate law, policy, ethics, and risk within digital humanities text data mining (TDM) projects. Building Legal Literacies for Text Data Mining (Building LLTDM) was made possible through a grant from the National Endowment for the Humanities.
Since the remote institute in June 2020, the participants and project team reconvened in February 2021 to discuss how participants had been thinking about, performing, or supporting TDM in their home institutions and projects with the law and policy literacies in mind.
To maximize the reach and impact of Building LLTDM, we have now published a comprehensive open educational resource (OER) of the contents of the institute. The OER covers copyright (both U.S. and international law), technological protection measures, privacy, and ethical considerations. It also helps other digital humanities professionals and researchers run their own similar institutes by describing in detail how we developed and delivered programming (including our pedagogical reflections and take-aways), and includes ideas for hosting shorter literacy teaching sessions. The resource (available as a web-book or in downloadable formats including PDF and EPUB) is in the public domain under the CC0 Public Domain Dedication, meaning it can be accessed, reused, and repurposed without restriction.
In addition to the OER, we’ve also published a white paper that describes the institute’s origins and goals, project overview and activities, and reflections and possible follow-on actions.
Thank you to the National Endowment for the Humanities, the project team, institute participants, and staff at the UC Berkeley Library for making Building LLTDM a success.
[Note: this content is cross-posted on the LLTDM blog.]
Upcoming workshop on how to share and publish data
Are you unsure about how you can use or reuse other people’s data in your teaching or research, and what the terms and conditions are? Do you want to share your data with other researchers or license it for reuse but are wondering how and if that’s allowed? Do you have questions about university or granting agency data ownership and sharing policies, rights, and obligations? We will provide clear guidance on all of these questions and more in this interactive webinar on the ins-and-outs of data sharing and publishing.
Join the Library’s Office of Scholarly Communication Services and the Research Data Management Program as we:
- Explore venues and platforms for sharing and publishing data
- Unpack the terms of contracts and licenses affecting data reuse, sharing, and publishing
- Help you understand how copyright does (and does not) affect what you can do with the data you create or wish to use from other people
- Consider how to license your data for maximum downstream impact and reuse
- Demystify data ownership and publishing rights and obligations under university and grant policies
Intended audiences include faculty, grad students, post-docs, instructors, and academic support staff, but anyone interested is welcome to attend.