UC Berkeley Library to Copyright Office: Protect fair uses in AI training for research and education

Madison Building, Library of Congress
Copyright Matt H. Wade, licensed CC-BY-NC-SA 3.0

We are pleased to share the UC Berkeley Library’s response to the U.S. Copyright Office’s Notice of Inquiry regarding artificial intelligence and copyright. Our response addresses the essential fair use right relied upon by UC Berkeley scholars in undertaking groundbreaking research, and the need to preserve access to the underlying copyright-protected content so that scholars using AI systems can conduct research inquiries.

In this blog post, we explain what the Copyright Office is studying, and why it was important for the Library to make scholars’ voices heard.

What the Copyright Office is studying and why

Loosely speaking, the Copyright Office wants to understand how to set policy for copyright issues raised by artificial intelligence (“AI”) systems.

Over the last year, AI systems and the rapid growth of their capabilities have attracted significant attention. One type of AI, referred to as “generative AI”, is capable of producing outputs such as text, images, video, or audio (including emulating a human voice) that would be considered copyrightable if created by a human author. These systems include, for instance, the chatbot ChatGPT, and text-to-image generators like DALL·E, Midjourney, and Stable Diffusion. A user can prompt ChatGPT to write a short story that features a duck and a frog who are best friends, or prompt DALL·E to create an abstract image in the style of a Jackson Pollock painting. Generative AI systems are relevant to and impact many educational activities on a campus like UC Berkeley, but (at least to date) have not been the key facilitator of campus research methodologies. 

Instead, in the context of research, scholars have been relying on AI systems to support a set of research methodologies referred to as “text and data mining” (or TDM). TDM utilizes computational tools, algorithms, and automated techniques to extract revelatory information from large sets of unstructured or thinly-structured digital content. Imagine you have a book like “Pride and Prejudice.” There are nearly infinite volumes of information stored inside that book, depending on your scholarly inquiry, such as how many female vs. male characters there are, what types of words the female characters use as opposed to the male characters, what types of behaviors the female characters display relative to the males, etc. TDM allows researchers to identify and analyze patterns, trends, and relationships across volumes of data that would otherwise be impossible to sift through on a close examination of one book or item at a time. 

Not all TDM research methodologies necessitate the usage of AI systems to extract this information. For instance, as in the “Pride and Prejudice” example above, sometimes TDM can be performed by developing algorithms to detect the frequency of certain words within a corpus, or to parse sentiments based on the proximity of various words to each other. In other cases, though, scholars must employ machine learning techniques to train AI models before the models can make a variety of assessments. 

Here is an illustration of the distinction: Imagine a scholar wishes to assess the prevalence with which 20th century fiction authors write about notions of happiness. The scholar likely would compile a corpus of thousands or tens of thousands of works of fiction, and then run a search algorithm across the corpus to detect the occurrence or frequency of words like “happiness,” “joy,” “mirth,” “contentment,” and synonyms and variations thereof. But if a scholar instead wanted to establish the presence of fictional characters who embody or display characteristics of being happy, the scholar would need to employ discriminative modeling (a classification and regression technique) that can train AI to recognize the appearance of happiness by looking for recurring indicia of character psychology, behavior, attitude, conversational tone, demeanor, appearance, and more. This is not using a generative AI system to create new outputs, but rather training a non-generative AI system to predict or detect existing content. And to undertake this type of non-generative AI training, a scholar would need to use a large volume of often copyright-protected works.

The Copyright Office is studying both of these kinds of AI systems—that is, both generative AI and non-generative AI. They are asking a variety of questions in response to having been contacted by stakeholders across sectors and industries with diverse views about how AI systems should be regulated. Some of the concerns expressed by stakeholders include: 

  • Who is the “author” of generative AI outputs?
  • Should people whose voices or images are used to train generative AI systems have a say in how their voices or images are used? 
  • Should the creator of an AI system (whether generative or non-generative) need permission from copyright holders to use copyright-protected materials in training the AI to predict and detect things?
  • Should copyright owners get to opt out of having their content used to train AI? Should ethics be considered within copyright regulation?

Several of these questions are already the subject of pending litigation. While these questions are being explored by the courts, the Copyright Office wants to understand the entire landscape better as it considers what kinds of AI copyright regulations to enact.

The copyright law and policy landscape underpinning the use of AI models is complex, and whatever regulatory decisions that the Copyright Office makes will bear ramifications for global enterprise, innovation, and trade. The Copyright Office’s inquiry thus raises significant and timely legal questions, many of which we are only beginning to understand. 

For these reasons, the Library has taken a cautious and narrow approach in its response to the inquiry: we address only two key principles known about fair use and licensing, as these issues bear upon the nonprofit education, research, and scholarship undertaken by scholars who rely on (typically non-generative) AI models. In brief, the Library wants to ensure that (1) scholars’ voices, and that of the academic libraries who support them, are heard to preserve fair use in training AI, and that (2) copyright-protected content remains available for AI training to support nonprofit education and research.

Why the study matters for fair use

Previous court cases like Authors Guild v. HathiTrust, Authors Guild v. Google, and A.V. ex rel. Vanderhye v. iParadigms have addressed fair use in the context of TDM and determined that the reproduction of copyrighted works to create and text mine a collection of copyright-protected works is a fair use. These cases further hold that making derived data, results, abstractions, metadata, or analysis from the copyright-protected corpus available to the public is also fair use, as long as the research methodologies or data distribution processes do not re-express the underlying works to the public in a way that could supplant the market for the originals. Performing all of this work is essential for TDM-reliant research studies.

For the same reasons that the TDM process is fair use of copyrighted works, the training of AI tools to do that TDM should also be fair use, in large part because training does not reproduce or communicate the underlying copyrighted works to the public. Here, there is an important distinction to make between training inputs and outputs, in that the overall fair use of generative AI outputs cannot always be predicted in advance: The mechanics of generative models’ operations suggest that there are limited instances in which generative AI outputs could indeed be substantially similar to (and potentially infringing of) the underlying works used for training; this substantial similarity is possible typically only when a training corpus is rife with numerous copies of the same work. However, the training of AI models by using copyright-protected inputs falls squarely within what courts have determined to be a transformative fair use, especially when that training is for nonprofit educational or research purposes. And it is essential to protect the fair use rights of scholars and researchers to make these uses of copyright-protected works when training AI.

Further, were these fair use rights overridden by limiting AI training access to only “safe” materials (like public domain works or works for which training permission has been granted via license), this would exacerbate bias in the nature of research questions able to be studied and the methodologies available to study them, and amplify the views of an unrepresentative set of creators given the limited types of materials available with which to conduct the studies.

Why access to AI training content should be preserved

For the same reasons, it is important that scholars’ ability to access the underlying content to conduct AI training be preserved. The fair use provision of the Copyright Act does not afford copyright owners a right to opt out of allowing other people to use their works for good reason: if content creators were able to opt out, the provision for fair use would be undermined, and little content would be available to build upon for the advancement of science and the useful arts. Accordingly, to the extent that the Copyright Office is considering creating a regulatory right for creators to opt out of having their works included in AI training, it is paramount that such opt-out provision not be extended to any AI training or activities that constitute fair use, particularly in the nonprofit educational and research contexts.

AI training opt-outs would be a particular threat for research and education because fair use in these contexts is already becoming an out-of-reach luxury even for the wealthiest institutions. Academic libraries are forced to pay significant sums each year to try to preserve fair use rights for campus scholars through the database and electronic content license agreements that libraries sign. In the U.S., the prospect of “contractual override” means that, although fair use is statutorily provided for, private parties (like publishers) may “contract around” fair use by requiring libraries to negotiate for otherwise lawful activities (such as conducting TDM or training AI for research), and often to pay additional fees for the right to conduct these lawful activities on top of the cost of licensing the content, itself. When such costs are beyond institutional reach, the publisher or vendor may then offer similar contractual terms directly to research teams, who may feel obliged to agree in order to get access to the content they need. Vendors may charge tens or even hundreds of thousands of dollars for this type of access.

This “pay-to-play” landscape of charging institutions for the opportunity to rely on existing statutory rights is particularly detrimental for TDM research methodologies, because TDM research often requires use of massive datasets with works from many publishers, including copyright owners that cannot be identified or who are unwilling to grant licenses. If the Copyright Office were to enable rightsholders to opt-out of having their works fairly used for training AI, then academic institutions and scholars would face even greater hurdles in licensing content for research purposes. 

First, it would be operationally difficult for academic publishers and content aggregators to amass and license the “leftover” body of copyrighted works that remain eligible for AI training. Costs associated with publishers’ efforts in compiling “AI-training-eligible” content would be passed along as additional fees charged to academic libraries. In addition, rightsholders might opt out of allowing their work to be used for AI training fair uses, and then turn around and charge AI usage fees to scholars (or libraries)—essentially licensing back fair uses for research. These scenarios would impede scholarship by or for research teams who lack grant or institutional funds to cover these additional expenses; penalize research in or about underfunded disciplines or geographical regions; and result in bias as to the topics and regions studied. 

Scholars need to be able to utilize existing knowledge resources to create new knowledge goods. Congress and the Copyright Office clearly understand the importance of facilitating access and usage rights, having implemented the statutory fair use provision without any exclusions or opt-outs. This status quo should be preserved for fair use AI training—and particularly in the nonprofit educational or research contexts. 

Our office is here to help

No matter what happens with the Copyright Office’s inquiry and any regulations that ultimately may be established, the UCB Library’s Office of Scholarly Communication Services is here to help you. We are a team of copyright law and information policy (licensing, privacy, and ethics) experts who help UC Berkeley scholars navigate legal, ethical, and policy considerations in utilizing resources in their research and teaching. And we are national and international leaders in supporting TDM research—offering online tools, trainings, and individual consultations to support your scholarship. Please feel free to reach out to us with any questions at schol-comm@berkeley.edu

Publish your scholarship like a pro!

Woman wearing gold watch, sitting at table, typing on a Microsoft Surface notebook
Photograph by Women of Color in Tech, CC-BY 2.0.

We’re more than a month into the fall semester, and if you’re a graduate student or postdoc you’ve probably been thinking about some of the milestones on your horizon, from filing your thesis or dissertation to pitching your first book project or looking for a job.

While we can’t write your dissertation or submit your job application for you, the Library can help in other ways! We are collaborating with GradPro to offer a series of professional development workshops for grad students, postdocs, and other early career scholars to guide you through important decisions and tasks in the research and publishing process, from preparing your dissertation to building a global audience for your work.

  • October 22: Copyright and Your Dissertation
  • October 23: From Dissertation to Book: Navigating the Publication Process
  • October 25: Managing and Maximizing Your Scholarly Impact

These sessions are focused on helping early career researchers develop real-world scholarly publishing skills and apply this expertise to a more open, networked, and interdisciplinary publishing environment.

These workshops are also taking place during Open Access Week 2019, an annual global effort to bring attention to Open Access around the world and highlight how the free, immediate, online availability of scholarship can remove barriers to information, support emerging scholarship, and foster the spread of knowledge and innovation.

Below is the list of next week’s workshop offerings. Join us for one workshop or all three! Each session will take place at the Graduate Professional Development Center, 309 Sproul Hall. Please RSVP at the links below.

Light refreshments will be served at all workshops.

If you have any questions about these workshops, please get in touch with schol-comm@berkeley.edu. And if you can’t make it to a workshop but still need help with your publishing, we are always here for you!


Copyright and Your Dissertation

Workshop | October 22 | 1-2:30 p.m. | 309 Sproul Hall

This workshop will provide you with a practical workflow for navigating copyright questions and legal considerations for your dissertation or thesis. Whether you’re just starting to write or you’re getting ready to file, you can use this workflow to figure out what you can use, what rights you have, and what it means to share your dissertation online.

RSVP (Copyright)


From Dissertation to Book: Navigating the Publication Process

Panel Discussion | October 23 | 3-4:30 p.m. | 309 Sproul Hall

Hear from a panel of experts – an acquisitions editor, a first-time book author, and an author rights expert – about the process of turning your dissertation into a book. You’ll come away from this panel discussion with practical advice about revising your dissertation, writing a book proposal, approaching editors, signing your first contract, and navigating the peer review and publication process.

RSVP (Book)


Managing and Maximizing Your Scholarly Impact

Workshop | October 25 | 1-2:30 p.m. | 309 Sproul Hall

This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.

RSVP (Impact)

Team Awarded Grant to Help Digital Humanities Scholars Navigate Legal Issues of Text Data Mining

We are thrilled to share that the National Endowment for the Humanities (NEH) has awarded a $165,000 grant to a UC Berkeley-led team of legal experts, librarians, and scholars who will help humanities researchers and staff navigate complex legal questions in cutting-edge digital research.

What is this grant all about?

If you were to crack open some popular English-language novels written in the 1850’s–say, ones from Brontë, Hawthorne, Dickens, and Melville–you would find they describe men and women in very different terms. While a male character might be said to “get” something, a female character is more likely to have “felt” it. Whereas the word “mind” might be used when describing a man, the word “heart” is more likely to be used about a woman. Yet, as the 19th Century became the 20th, these descriptive differences between genders actually diminish. How do we know all this? We confess we have not actually read every novel ever written between the 19th and 21st Centuries (though we’d love to envision a world in which we could). Instead, we can make this assertion because researchers (including David Bamman, of UC Berkeley’s School of Information) used automated techniques to extract information from the novels, and analyzed these word usage trends at scale. They crafted algorithms to turn the language of those novels into data about the novels.

In fields of inquiry like the digital humanities, the application of such automated techniques and methods for identifying, extracting, and analyzing patterns, trends, and relationships across large volumes of unstructured or thinly-structured digital content is called “text data mining.” (You may also see it referred to as “text and data mining” or “computational text analysis”). Text data mining provides humanists and social scientists with invaluable frameworks for sifting, organizing, and analyzing vast amounts of material. For instance, these methods make it possible to:

The Problem

Until now, humanities researchers conducting text data mining have had to navigate a thicket of legal issues without much guidance or assistance. For instance, imagine the researchers needed to scrape content about Egyptian artifacts from online sites or databases, or download videos about Egyptian tomb excavations, in order to conduct their automated analysis. And then imagine the researchers also want to share these content-rich data sets with others to encourage research reproducibility or enable other researchers to query the data sets with new questions. This kind of work can raise issues of copyright, contract, and privacy law, not to mention ethics if there are issues of, say, indigenous knowledge or cultural heritage materials plausibly at risk. Indeed, in a recent study of humanities scholars’ text analysis needs, participants noted that access to and use of copyright-protected texts was a “frequent obstacle” in their ability to select appropriate texts for text data mining. 

Potential legal hurdles do not just deter text data mining research; they also bias it toward particular topics and sources of data. In response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid decision-making about rights-protected data. They use texts that have entered into the public domain or use materials that have been flexibly licensed through initiatives such as Creative Commons or Open Data Commons. When researchers limit their research to such sources, it is inevitably skewed, leaving important questions unanswered, and rendering resulting findings less broadly applicable. A growing body of research also demonstrates how race, gender, and other biases found in openly available texts have contributed to and exacerbated bias in developing artificial intelligence tools. 

The Solution

The good news is that the NEH has agreed to support an Institute for Advanced Topics in the Digital Humanities to help key stakeholders to learn to better navigate legal issues in text data mining. Thanks to the NEH’s $165,000 grant, Rachael Samberg of UC Berkeley Library’s Office of Scholarly Communication Services will be leading a national team (identified below) from more than a dozen institutions and organizations to teach humanities researchers, librarians, and research staff how to confidently navigate the major legal issues that arise in text data mining research. 

Our institute is aptly called Building Legal Literacies for Text Data Mining (Building LLTDM), and will run from June 23-26, 2020 in Berkeley, California. Institute instructors are legal experts, humanities scholars, and librarians immersed in text data mining research services, who will co-lead experiential meeting sessions empowering participants to put the curriculum’s concepts into action.

In October, we will issue a call for participants, who will receive stipends to support their attendance. We will also be publishing all of our training materials in an openly-available online book for researchers and librarians around the globe to help build academic communities that extend these skills.

Building LLTDM team member Matthew Sag, a law professor at Loyola University Chicago School of Law and leading expert on copyright issues in the digital humanities, said he is “excited to have the chance to help the next generation of text data mining researchers open up new horizons in knowledge discovery. We have learned so much in the past ten years working on HathiTrust [a text-minable digital library] and related issues. I’m looking forward to sharing that knowledge and learning from others in the text data mining community.” 

Team member Brandon Butler, a copyright lawyer and library policy expert at the University of Virginia, said, “In my experience there’s a lot of interest in these research methods among graduate students and early-career scholars, a population that may not feel empowered to engage in “risky” research. I’ve also seen that digital humanities practitioners have a strong commitment to equity, and they are working to build technical literacies outside the walls of elite institutions. Building legal literacies helps ease the burden of uncertainty and smooth the way toward wider, more equitable engagement with these research methods.”

Kyle K. Courtney of Harvard University serves as Copyright Advisor at Harvard Library’s Office for Scholarly Communication, and is also a Building LLTDM team member. Courtney added, “We are seeing more and more questions from scholars of all disciplines around these text data mining issues. The wealth of full-text online materials and new research tools provide scholars the opportunity to analyze large sets of data, but they also bring new challenges having to do with the use and sharing not only of the data but also of the technological tools researchers develop to study them. I am excited to join the Building LLTDM team and help clarify these issues and empower humanities scholars and librarians working in this field.”

Megan Senseney, Head of the Office of Digital Innovation and Stewardship at the University of Arizona Libraries reflected on the opportunities for ongoing library engagement that extends beyond the initial institute. Senseney said that, “Establishing a shared understanding of the legal landscape for TDM is vital to supporting research in the digital humanities and developing a new suite of library services in digital scholarship. I’m honored to work and learn alongside a team of legal experts, librarians, and researchers to create this institute, and I look forward to integrating these materials into instruction and outreach initiatives at our respective universities.”

Next Steps

The Building LLTDM team is excited to begin supporting humanities researchers, staff, and librarians en route to important knowledge creation. Stay tuned if you are interested in participating in the institute. 

In the meantime, please join us in congratulating all the members of the project team:

  • Rachael G. Samberg (University of California, Berkeley) (Project Director)
  • Scott Althaus (University of Illinois, Urbana-Champaign)
  • David Bamman (University of California, Berkeley)
  • Sara Benson (University of Illinois, Urbana-Champaign)
  • Brandon Butler (University of Virginia)
  • Beth Cate (Indiana University, Bloomington)
  • Kyle K. Courtney (Harvard University)
  • Maria Gould (California Digital Library)
  • Cody Hennesy (University of Minnesota, Twin Cities)
  • Eleanor Koehl (University of Michigan)
  • Thomas Padilla (University of Nevada, Las Vegas; OCLC Research)
  • Stacy Reardon (University of California, Berkeley)
  • Matthew Sag (Loyola University Chicago)
  • Brianna Schofield (Authors Alliance)
  • Megan Senseney (University of Arizona)
  • Glen Worthey (Stanford University)

CP2OA results are in: Open access efforts are taking flight

This photo depicts a sign pointing forum participants to discussion rooms.
At the Choosing Pathways to Open Access Forum, participants discussed ways to develop plans for repurposing subscription funds to support open access publishing. This photo depicts a sign pointing forum participants to discussion rooms. (Photo by Jami Smith for the UC Berkeley Library)


On October 16-17, 2018, University of California (UC) libraries hosted a working forum in Berkeley, California, called Choosing Pathways to Open Access (CP2OA). Sponsored by the University of California’s Council of University Librarians (CoUL), the forum was designed to enable North American library and consortium leaders and key academic stakeholders to engage in action-focused deliberations about redirecting subscription and other funds toward sustainable open access (OA) publishing.

More than 120 participants arrived from more than 80 institutions, nearly 30 states, and four Canadian provinces. The goal was for everyone to leave with their own customized plans for how they will repurpose subscription spends within their home organizations or communities—and more broadly, through collective efforts, move the OA needle forward.

CP2OA was admittedly a gamble: Could library stakeholders spend two days immersed in a design thinking process, wrestling with the nitty-gritty of numerous OA funding strategies, then depart with actionable steps for making OA a reality? When CoUL approved the forum, they charged the Planning Committee (that’s us) not only with putting the forum together, but also with reporting back to them about whether this grand experiment worked. We have followed up with participants and analyzed the data, and the results are clear: Through CP2OA, the UC libraries have helped to inspire meaningful change.

With that, we hereby announce our Planning Committee’s report to CoUL analyzing forum outcomes. To keep CP2OA momentum going, our report also synthesizes forum outcomes into recommendations for further collective action by CoUL to advance OA. The report’s recommendations reflect our personal opinions as Planning Committee members, and are not an official statement by CoUL, nor should publication of this report be seen as CoUL’s endorsement of our recommendations. We are thrilled that CoUL will be considering our recommendations at its upcoming June meeting, and further note that some of our recommendations reflect efforts already underway within various UC libraries.

We encourage you to check out the full report to see why the format of CP2OA was so successful, and to learn more about everything it inspired. We also understand you may just want the highlights, so … without further ado:

CP2OA Forum Outcomes

Two months after the forum, we surveyed participants about their perceptions of the forum, and any actions they had taken as a result of having participated. Our survey response rate was approximately 48% (58 responses), and revealed the following:

  • Perceptions of the forum were almost universally positive, with some participants describing the forum as “exceptional,” “highly effective,” “energizing and motivating,” and a “model for how we should be engaging professionally.” Participants found the forum structure particularly conducive to enabling action.
  • Though just two months had passed between the CP2OA forum and the time when we polled participants, more than 75% of responding participants reported having taken action toward advancing open access. Fifty percent (50%) of those who took action embarked upon what we categorized as “concrete” actionsthat is, express steps such as starting pilots, undertaking publishing data analyses, and negotiating with publishers. The remaining 50% undertook at a minimum conversations and outreach within or external to their libraries.
  • Some examples of concrete next steps included: (1) formation of a group providing consultations and support for transitioning society publications to open access (http://www.tspoa.org); (2) first OA investment by an institution that had not yet formally engaged with OA; (3) commitment to requiring OA in upcoming license negotiations with a STEM publisher; (4) formation of OA values statements to guide institutional investment; (5) pursuit of transformative (e.g., offsetting or “read and publish”) agreements through which an institution’s publications are made OA as part of an overall subscription license agreement; (6) building OA publishing into promotion and tenure considerations; and (7) increased institutional repository deposits and outreach.

Planning Committee’s Recommendations to CoUL

In advance of considering our recommendations this summer, CoUL has already approved some right off the bat, including:

  • Making available the CP2OA Planning Committee’s report and all CP2OA public-facing documentation so that other institutions can have a blueprint for replicating or tailoring CP2OA to their needs. CoUL also approved a second round of CP2OA reporting so the Planning Committee could check in on forum participants’ progress later in the year.
  • Continuing CoUL’s efforts to develop a public toolkit to support other institutions seeking to engage in “big deal” (large subscription journal package) re-negotiations that include OA components, and/or to engage more generally in transformative (e.g., offsetting/read-and-publish) agreements.

In June, CoUL will be addressing the other proposals in our report, including:

  • Engaging the UC academic senate with OA in promotion and tenure
  • Expanding institutional staffing and support for identification and evaluation of, and decision-making relating to, OA publishing investments and transforming the scholarly publishing landscape
  • Dedicating collections funds across campuses to be used for supporting OA publishing
  • Funding new data analyst positions to provide further inward-facing support for data-driven OA investments by UC libraries as well as outward-facing consultative support to the community beyond UC
  • Collective investment in UC Press OA publishing
  • Increasing support for monograph subventions for UC authors
  • Collective investment in transformative cooperatives or non-APC approaches to OA publishing
  • Committing to enhancing eScholarship, including expansion of OA publishing services
  • Exploring opportunities for collective investment in open source infrastructure to support OA publishing

We will keep the community updated about how CoUL responds to these recommendations, as well as any UC collective next steps.

In the meantime, we hope you will share in some of the excitement that CP2OA has generated and continue your own journeys toward helping to transform our scholarly publishing ecosystem.

Onward to open access!


  • Rachael Samberg (UC Berkeley; CP2OA Co-chair)
  • Donald Barclay (UC Merced)
  • John Renaud (UC Irvine)
  • Lisa Schiff (California Digital Library)
  • Allegra Swift (UC San Diego)
  • Anneliese Taylor (UCSF)
  • Mat Willmott (California Digital Library)

Public Domain Day 2019: Explainer

Spilled box of popcorn with tickets and film strip
Image by annca on Pixabay.

Our Library is thrilled to have digitized some unique materials from 1923 and shared them with the world for Public Domain Day 2019. We thought we’d dig deeper here on the Office of Scholarly Communication Services blog about why we were able to do all this without infringing anyone’s copyright.

Grab your popcorn, because here we go!

What is copyright, and what’s the public domain?

Despite seeming daunting at times, at its core, copyright is surprisingly straightforward: Copyright laws give authors of expressive works (imagine: paintings, musical scores, essays, articles, novels, screenplays, and the like) exclusive rights over their creations for limited periods of time. Unless some exception applies (we’ll say more below), the person or entity who holds copyright is the only one who can publish, reproduce, adapt, perform, or display that creative work for as long as the copyright protection lasts. Providing authors and artists these exclusive rights is intended as a reward system to encourage them to write and make things. But, these rights do not last indefinitely because perpetual protection would stymie innovation–since other scholars and artists would not be able to build upon the existing works.

This time-limited incentive framework originated from Article 1, Section 8 of the Constitution, which empowered Congress to create laws meant promote the “progress of science,” which they intended broadly). The copyright laws that Congress subsequently created grant authors what is often referred to as a “bundle” of time-limited exclusive rights. There are some important exceptions to an author’s exclusive rights, such as fair use–which is intended to promote scholarship and research by allowing otherwise-protected uses of a copyrighted work.

You can begin to see that if a scholar is writing a book or article that reproduces or adapts someone else’s creation, the scholar may need the copyright holder’s permission if the work is still protected, and the scholar’s intended use exceeds what’s considered “fair” (a target sometimes hard to nail down).

A key point in this reward framework is that copyright protection does expire–and when it does, the works enters what is called the “public domain.” Public domain works can be used by anyone for any purpose, without having to ask the author’s permission first. When materials enter the public domain, suddenly it becomes possible to adapt or excerpt them in any fashion without worrying about whether one’s use falls within the fair use exception, or whether the author’s permission is needed.

With an entire year’s worth of U.S publications now entering the public domain, scholars and artists have a rich new crop of unrestricted content with which to play.

How long does copyright last?

Old alarm clock, white, with rusty hands and bells.
Image by PIRO4D on Pixabay.

What is so special about 1923, and why is it the magic number right now for the public domain? In 1998, Congress amended the copyright laws such that many works published from 1923 through 1977 received an extended grant of copyright protection for 95 years from the date of their creation. When the clock struck January 1st, 2019, those 95 years were up for anything published in the United States in 1923. Now, and for the next few decades unless Congress changes the laws again, each time we mark a January 1st, a new year’s worth of once-copyrighted material will enter the public domain.

This January 1st public domain extravaganza will not carry on indefinitely, though. For many U.S. works created after 1977, the length of copyright protection is actually the author’s lifetime plus 70 years–rather than a set 95-year period. That means, to determine the public domain status of a work written in, say, 1985, someone would need to investigate whether the author is still alive and, if not, whether 70 years have transpired since her death. (The period is even longer for corporate-authored works.)

Complicating matters even further is the fact that there are many publications published between 1923 and 1977 that are already in the public domain, even though 95 years have not yet transpired since their creation. This is because certain procedural requirements applied during that time period that obligated authors to take extra steps to either receive or extend their copyright protection. Here’s an illustration: Imagine two authors each wrote an autobiography in 1977. One of those autobiographies is still protected by copyright, but the other is already in the public domain because the writer failed to publish it with a copyright notice (sometimes designated as “©”), which was a formality required at the time. These days, copyright protection applies automatically, and it is not necessary to include a copyright notice on one’s work or register the work with the U.S. Copyright Office to receive protection. 

In fact, some of the items we digitized for our Public Domain Day project were technically already in the public domain! We could have digitized them earlier, but it’s actually quite challenging for individual libraries to research the registration and formalities compliance of materials on an item-by-item basis. Now that it’s 2019, we can safely digitize 1923 without having to dig into each item individually. (By the way, HathiTrust, an online digital library, has been trying to distribute the work of identifying more titles from 1923-1977 that have entered the public domain. They have organized a Copyright Review Program to help spread the item-by-item labor across multiple institutions.)

Why does the public domain matter for scholars?

You can start to see that these laws regarding copyright duration can be extraordinarily complex. But, sifting through all of them can be critical for campus scholars if they wish to use or republish portions of other people’s creative works in their own scholarly writing. Fortunately, our Office of Scholarly Communication Services helps the campus navigate these nuances, as well as evaluate whether their scholarly intentions may fall within the fair use exception.

Certainly, the more material no longer protected by copyright, the clearer everything becomes for scholars seeking to use it. If writers wished to adapt those 1977 autobiographies mentioned above into a movie, or to republish large portions from them in their research, the prospect of doing so becomes a lot easier with the autobiography that has entered the public domain.

Macbook laptop with charts and graphs
Image by rawpixel on Unsplash.

The public domain also offers another boon for scholars: More content to freely text mine.

Text mining describes a research approach in which scholars use automated methods to identify, extract, and analyze patterns and trends in large volumes of digital content. For instance, text mining techniques have made it possible for scholars like UC Berkeley’s David Bamman to extract language from novels to understand how depictions of gender have changed in fiction since the eighteenth century, or analyze the rhetoric of campaign speeches to make predictive determinations about audience response. Having more material in the public domain can help with that by removing potential copyright barriers as scholars access and republishing the text being analyzed.

Many of our campus scholars ask important questions about socio-cultural trends. Often, the content they need to study–let’s say content embedded in scientific journals–is protected by copyright. Our office helps these researchers understand that their text mining research methodologies can be fair use. However, if the researchers also want to share the content that they are analyzing with others–so that other scholars can verify the algorithms being used, or query the text for different questions–then, researchers might be pushing the limit of what is a “fair” amount of republishing or redistribution of copyright-protected text. Again, as more content enters the public domain, these barriers disappear.

We hope this Public Domain Day 2019 explainer has helped clarify the mechanics and frame the significance of what happened on Jan. 1. If you’d like to learn more or need some copyright help, be in touch!

What a semester! What’s up next?

Photo by Karen Lau on Unsplash

Is it just us, or was fall semester a whirlwind? The Office of Scholarly Communication Services was steeped in a steady flurry of activity, and suddenly it’s December! We wanted to take a moment to highlight what we’ve been up to since August, and give you a preview of what’s ahead for spring.

We did the math on our affordable course content pilot program, which ran for academic year 2017-2018 and Fall 2018. This pilot supported just over 40 courses and 2400 students, and is estimated to have yielded approximately $200,000 in student savings. We’ll be working with campus on next steps for helping students save money. If you have questions about how to make your class more affordable, you can check out our site or e-mail us.

We dug deep into scholarly publishing skills with graduate students and early career researchers during our professional development workshop series. We engaged learners in issues like copyright and their dissertations, moving from dissertation to first book, and managing and maximizing scholarly impact. Publishing often isn’t complete without sharing one’s data, so we helped researchers understand how to navigate research data copyright and licensing issues at #FSCI2018.

We helped instructors and scholars publish open educational resources and digital books with PressbooksEDU on our new open books hub.

On behalf of the UC’s Council of University Librarians, we chaired and hosted the Choosing Pathways to OA working forum. The forum brought together approximately 125 representatives of libraries, consortia, and author communities throughout North America to develop personalized action plans for how we can all transition funds away from subscriptions and toward sustainable open access publishing. We will be reporting on forum outcomes in 2019. In the meantime, one immediate result was the formation of a working group to support scholarly society journal publishers in flipping their journals from closed access to open access. Stay tuned for an announcement in January.

We funded dozens of Open Access publications by UC Berkeley authors through our BRII program

We developed a novel literacies workflow for text data mining researchers. Text mining allows researchers to use automated techniques to glean trends and information from large volumes of unstructured textual sources. Researchers often perceive legal stumbling blocks to conducting this type of research, since some of the content is protected by copyright or other use restrictions. In Fall 2018, we began training the UC Berkeley community on how to navigate these challenges so that they can confidently undertake this important research. We’ll have a lot more to say about our work on this soon!

Next semester, we’re continuing all of these efforts with a variety of scholarly publishing workshops. We invite you to check out: Copyright & Fair Use for Digital Projects, Text Data Mining & Publishing: Legal Literacies, Copyright for Wikipedia Editing, and more.

We would like to thank Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin, for their generous support in helping to make the work of the Office of Scholarly Communication Services possible.

Lastly, we’d like to thank all of you for your engagement and support this semester! Please let us know how else we can serve you. In the meantime, we wish you a Happy New Year!

E-mail: schol-comm@berkeley.edu

Twitter: @UCB_scholcomm

Website: lib.berkeley.edu/scholcomm

Toolkit to Help Research Institutions Transition to Open Access

closed books on a shelf in a dimly-lit library

Today, all ten University of California campus libraries released a Pathways to Open Access toolkit to help research libraries and organizations around the world make the same kinds of difficult decisions that we’ve been undertaking about repurposing campus subscription spends to support sustainable open access publishing.

“Essentially no research institutions in the world,” says UC Berkeley University Librarian Jeffrey MacKie-Mason, “can afford to provide their scholars with access to the full corpus of scholarly literature being produced and then sequestered behind increasingly out-of-reach subscription paywalls that yield major academic publishers a nearly 40 percent profit margin.”

In the Pathways documents, linked below, the campus libraries critically analyzed different open access publishing models, as well as the various funding and other strategies to achieve them, and then developed a set of possible next steps as to which UC libraries could partner or experiment. The libraries hope this toolkit will be equally valuable to other institutions wrestling with how to make strides in moving away from a closed-access publishing landscape.

You can read more about the toolkit here: http://news.lib.berkeley.edu/pathways-to-open-access

With any questions, please contact schol-comm@berkeley.edu.

Welcome Maria Gould: Scholarly Communication & Copyright Librarian

Photo of Maria Gould
Maria Gould, Scholarly Communication & Copyright Librarian

The Library’s Office of Scholarly Communication Services is thrilled to announce that Maria Gould has joined as our new Scholarly Communication & Copyright Librarian. Maria started on January 16, and has already begun helping scholars on this campus and beyond in shaping their scholarly publishing skills and publishing impact.

While Maria is new to this position, she is not new to UC Berkeley—having received her MA in Latin American Studies in 2011. Maria also obtained her MLIS from the Simmons School of Library and Information Science, and completed an internship at Bancroft Library while pursuing her degree.

Maria’s substantial scholarly publishing experience makes her an incredibly valuable resource for campus researchers. Prior to joining the Library, Maria worked for PLOS (Public Library of Science) for six years, where, among other projects, she was responsible for developing staff training and resources, updating and maintaining policy guidance and system instructions for authors and peer reviewers, and supporting outreach and engagement initiatives for editorial board members and the reviewer community.

Already in just her first month at the Library, she has helped shepherd our work in support of open digital scholarship and affordable course content. With this warm welcome to her, we hope you will reach out to all of us in the Office of Scholarly Communication Services for your publishing needs. We can help with:

  • Copyright in research, publishing & teaching
  • Authors’ rights, and protecting & managing your intellectual property
  • Scholarly publishing options and platforms
  • Open access for scholarship and research data
  • Affordable and open course content
  • Tracking & increasing scholarly impact

Want help or more information? We provide:

  • Individualized support & personal consultations
  • In-class and online instruction
  • Presentations and workshops for small or large groups & classes
  • Customized support and training for each department and discipline
  • Online guidance and resources

Learn more at: lib.berkeley.edu/scholcomm

Keep up to date: @UCB_scholcomm

Opening UC History and Success to the World

photograph of author
Photograph of Jud King

150 years following its founding in 1869, the University of California is regarded by many as the most successful and highly-respected public research university in the world. In his new book, Judson King, former Berkeley and University of California Provost and former CSHE director, explores the most important factors for this academic success, and what makes UC tick. What’s more, he’s made his insightful analysis available to the world by publishing his book open access.

Please join Judson King, Chancellor Carol T. Christ, University Librarian Jeff MacKie-Mason, and CSHE administration for a special event and reception delving into the academic history of the University of California, and examining how best it can be shared to inspire global institutional development.


Event Details:

  • Discussion of The University of California: Creating, Nurturing, and Maintaining Academic Quality in a Public University Setting
  • February 28, 2018, at the Morrison Library from 5:00 p.m. – 6:30 p.m.
  • Refreshments and hors d’oeuvres will be served
  • RSVP required

This event is co-sponsored by the Library’s Office of Scholarly Communication Services and the Center for Studies in Higher Education. It is also offered in connection with Berkeley’s celebration of 150 Years of Light.

Berkeley 150_Logo


Wondering how, where, and what to publish? Our symposium has you covered.

Typewriter with "Ready to Get Published" written on page.
Get ready for publishing at our scholarly publishing symposium.

How, Where, and What to Publish: UC Berkeley Scholarly Publishing Symposium
January 31, 8:30 a.m.-12:30 p.m.

309 Sproul Hall (Graduate Professional Development Center)

Register online: bit.ly/013018pubsymposium



Are you an early career researcher looking to make a mark? Come hear from leading scholarly journal and book publishers (such as Elsevier, Springer-Nature, and UC Press) and open publishing framework and platform creators (such as Collaborative Knowledge Foundation and California Digital Library) during a half-day symposium in which experts cover all aspects of how, where, and what to publish.

Panel presentations and participant discussions will address:

Publishing Essentials

  • Targeting the “right” journal for submission
  • Overview of the editorial process from submission to acceptance, and responding to reviewer comments
  • Publishing ethics
  • Communicating your research with a broad audience

Trends in Open Scholarship

  • Value of publishing open access
  • Publishing preprints & post-prints
  • Avoiding predatory publishers
  • Trends in peer review
  • Metrics

Data Publishing

  • Open data, publishing mandates, and publishing options
  • Research data management
  • Licensing research data for reuse

Refreshments will be provided.

This event is co-sponsored by the Library’s Office of Scholarly Communication Services and Research Data Management Program.