UC Berkeley Library to Copyright Office: Protect fair uses in AI training for research and education

Madison Building, Library of Congress
Copyright Matt H. Wade, licensed CC-BY-NC-SA 3.0

We are pleased to share the UC Berkeley Library’s response to the U.S. Copyright Office’s Notice of Inquiry regarding artificial intelligence and copyright. Our response addresses the essential fair use right relied upon by UC Berkeley scholars in undertaking groundbreaking research, and the need to preserve access to the underlying copyright-protected content so that scholars using AI systems can conduct research inquiries.

In this blog post, we explain what the Copyright Office is studying, and why it was important for the Library to make scholars’ voices heard.

What the Copyright Office is studying and why

Loosely speaking, the Copyright Office wants to understand how to set policy for copyright issues raised by artificial intelligence (“AI”) systems.

Over the last year, AI systems and the rapid growth of their capabilities have attracted significant attention. One type of AI, referred to as “generative AI”, is capable of producing outputs such as text, images, video, or audio (including emulating a human voice) that would be considered copyrightable if created by a human author. These systems include, for instance, the chatbot ChatGPT, and text-to-image generators like DALL·E, Midjourney, and Stable Diffusion. A user can prompt ChatGPT to write a short story that features a duck and a frog who are best friends, or prompt DALL·E to create an abstract image in the style of a Jackson Pollock painting. Generative AI systems are relevant to and impact many educational activities on a campus like UC Berkeley, but (at least to date) have not been the key facilitator of campus research methodologies. 

Instead, in the context of research, scholars have been relying on AI systems to support a set of research methodologies referred to as “text and data mining” (or TDM). TDM utilizes computational tools, algorithms, and automated techniques to extract revelatory information from large sets of unstructured or thinly-structured digital content. Imagine you have a book like “Pride and Prejudice.” There are nearly infinite volumes of information stored inside that book, depending on your scholarly inquiry, such as how many female vs. male characters there are, what types of words the female characters use as opposed to the male characters, what types of behaviors the female characters display relative to the males, etc. TDM allows researchers to identify and analyze patterns, trends, and relationships across volumes of data that would otherwise be impossible to sift through on a close examination of one book or item at a time. 

Not all TDM research methodologies necessitate the usage of AI systems to extract this information. For instance, as in the “Pride and Prejudice” example above, sometimes TDM can be performed by developing algorithms to detect the frequency of certain words within a corpus, or to parse sentiments based on the proximity of various words to each other. In other cases, though, scholars must employ machine learning techniques to train AI models before the models can make a variety of assessments. 

Here is an illustration of the distinction: Imagine a scholar wishes to assess the prevalence with which 20th century fiction authors write about notions of happiness. The scholar likely would compile a corpus of thousands or tens of thousands of works of fiction, and then run a search algorithm across the corpus to detect the occurrence or frequency of words like “happiness,” “joy,” “mirth,” “contentment,” and synonyms and variations thereof. But if a scholar instead wanted to establish the presence of fictional characters who embody or display characteristics of being happy, the scholar would need to employ discriminative modeling (a classification and regression technique) that can train AI to recognize the appearance of happiness by looking for recurring indicia of character psychology, behavior, attitude, conversational tone, demeanor, appearance, and more. This is not using a generative AI system to create new outputs, but rather training a non-generative AI system to predict or detect existing content. And to undertake this type of non-generative AI training, a scholar would need to use a large volume of often copyright-protected works.

The Copyright Office is studying both of these kinds of AI systems—that is, both generative AI and non-generative AI. They are asking a variety of questions in response to having been contacted by stakeholders across sectors and industries with diverse views about how AI systems should be regulated. Some of the concerns expressed by stakeholders include: 

  • Who is the “author” of generative AI outputs?
  • Should people whose voices or images are used to train generative AI systems have a say in how their voices or images are used? 
  • Should the creator of an AI system (whether generative or non-generative) need permission from copyright holders to use copyright-protected materials in training the AI to predict and detect things?
  • Should copyright owners get to opt out of having their content used to train AI? Should ethics be considered within copyright regulation?

Several of these questions are already the subject of pending litigation. While these questions are being explored by the courts, the Copyright Office wants to understand the entire landscape better as it considers what kinds of AI copyright regulations to enact.

The copyright law and policy landscape underpinning the use of AI models is complex, and whatever regulatory decisions that the Copyright Office makes will bear ramifications for global enterprise, innovation, and trade. The Copyright Office’s inquiry thus raises significant and timely legal questions, many of which we are only beginning to understand. 

For these reasons, the Library has taken a cautious and narrow approach in its response to the inquiry: we address only two key principles known about fair use and licensing, as these issues bear upon the nonprofit education, research, and scholarship undertaken by scholars who rely on (typically non-generative) AI models. In brief, the Library wants to ensure that (1) scholars’ voices, and that of the academic libraries who support them, are heard to preserve fair use in training AI, and that (2) copyright-protected content remains available for AI training to support nonprofit education and research.

Why the study matters for fair use

Previous court cases like Authors Guild v. HathiTrust, Authors Guild v. Google, and A.V. ex rel. Vanderhye v. iParadigms have addressed fair use in the context of TDM and determined that the reproduction of copyrighted works to create and text mine a collection of copyright-protected works is a fair use. These cases further hold that making derived data, results, abstractions, metadata, or analysis from the copyright-protected corpus available to the public is also fair use, as long as the research methodologies or data distribution processes do not re-express the underlying works to the public in a way that could supplant the market for the originals. Performing all of this work is essential for TDM-reliant research studies.

For the same reasons that the TDM process is fair use of copyrighted works, the training of AI tools to do that TDM should also be fair use, in large part because training does not reproduce or communicate the underlying copyrighted works to the public. Here, there is an important distinction to make between training inputs and outputs, in that the overall fair use of generative AI outputs cannot always be predicted in advance: The mechanics of generative models’ operations suggest that there are limited instances in which generative AI outputs could indeed be substantially similar to (and potentially infringing of) the underlying works used for training; this substantial similarity is possible typically only when a training corpus is rife with numerous copies of the same work. However, the training of AI models by using copyright-protected inputs falls squarely within what courts have determined to be a transformative fair use, especially when that training is for nonprofit educational or research purposes. And it is essential to protect the fair use rights of scholars and researchers to make these uses of copyright-protected works when training AI.

Further, were these fair use rights overridden by limiting AI training access to only “safe” materials (like public domain works or works for which training permission has been granted via license), this would exacerbate bias in the nature of research questions able to be studied and the methodologies available to study them, and amplify the views of an unrepresentative set of creators given the limited types of materials available with which to conduct the studies.

Why access to AI training content should be preserved

For the same reasons, it is important that scholars’ ability to access the underlying content to conduct AI training be preserved. The fair use provision of the Copyright Act does not afford copyright owners a right to opt out of allowing other people to use their works for good reason: if content creators were able to opt out, the provision for fair use would be undermined, and little content would be available to build upon for the advancement of science and the useful arts. Accordingly, to the extent that the Copyright Office is considering creating a regulatory right for creators to opt out of having their works included in AI training, it is paramount that such opt-out provision not be extended to any AI training or activities that constitute fair use, particularly in the nonprofit educational and research contexts.

AI training opt-outs would be a particular threat for research and education because fair use in these contexts is already becoming an out-of-reach luxury even for the wealthiest institutions. Academic libraries are forced to pay significant sums each year to try to preserve fair use rights for campus scholars through the database and electronic content license agreements that libraries sign. In the U.S., the prospect of “contractual override” means that, although fair use is statutorily provided for, private parties (like publishers) may “contract around” fair use by requiring libraries to negotiate for otherwise lawful activities (such as conducting TDM or training AI for research), and often to pay additional fees for the right to conduct these lawful activities on top of the cost of licensing the content, itself. When such costs are beyond institutional reach, the publisher or vendor may then offer similar contractual terms directly to research teams, who may feel obliged to agree in order to get access to the content they need. Vendors may charge tens or even hundreds of thousands of dollars for this type of access.

This “pay-to-play” landscape of charging institutions for the opportunity to rely on existing statutory rights is particularly detrimental for TDM research methodologies, because TDM research often requires use of massive datasets with works from many publishers, including copyright owners that cannot be identified or who are unwilling to grant licenses. If the Copyright Office were to enable rightsholders to opt-out of having their works fairly used for training AI, then academic institutions and scholars would face even greater hurdles in licensing content for research purposes. 

First, it would be operationally difficult for academic publishers and content aggregators to amass and license the “leftover” body of copyrighted works that remain eligible for AI training. Costs associated with publishers’ efforts in compiling “AI-training-eligible” content would be passed along as additional fees charged to academic libraries. In addition, rightsholders might opt out of allowing their work to be used for AI training fair uses, and then turn around and charge AI usage fees to scholars (or libraries)—essentially licensing back fair uses for research. These scenarios would impede scholarship by or for research teams who lack grant or institutional funds to cover these additional expenses; penalize research in or about underfunded disciplines or geographical regions; and result in bias as to the topics and regions studied. 

Scholars need to be able to utilize existing knowledge resources to create new knowledge goods. Congress and the Copyright Office clearly understand the importance of facilitating access and usage rights, having implemented the statutory fair use provision without any exclusions or opt-outs. This status quo should be preserved for fair use AI training—and particularly in the nonprofit educational or research contexts. 

Our office is here to help

No matter what happens with the Copyright Office’s inquiry and any regulations that ultimately may be established, the UCB Library’s Office of Scholarly Communication Services is here to help you. We are a team of copyright law and information policy (licensing, privacy, and ethics) experts who help UC Berkeley scholars navigate legal, ethical, and policy considerations in utilizing resources in their research and teaching. And we are national and international leaders in supporting TDM research—offering online tools, trainings, and individual consultations to support your scholarship. Please feel free to reach out to us with any questions at schol-comm@berkeley.edu


“Robert Cox: Sierra Club President 1994-96, 2000-01, and 2007-08, on Environmental Communications and Strategy,” oral history release

New oral history: “Robert Cox: Sierra Club President 1994-96, 2000-01, and 2007-08, on Environmental Communications and Strategy”

Video clip from Robert Cox’s oral history on the Sierra Club’s environmental justice work with Jesus People Against Pollution (JPAP) in 1994

Black and white photograph of Robert Cox wearing a polo-style shirt while standing in front of a wall of leafy bushes
UNC Professor Robert Cox in 1994 upon his first time being elected as president of the national Sierra Club

Robert Cox is a scholar and a gentleman. He also has a fire burning in his belly for protecting nature, confronting injustice, and empowering people, which fueled his long-time leadership in environmental politics, strategy, and influential communication. Robbie Cox served three times as president of the national Sierra Club in 1994-96, 2000-01, and 2007-08. He is Professor Emeritus at the University of North Carolina at Chapel Hill (UNC-CH), and as a scholar of activist rhetoric, Cox helped found the academic field of environmental communication.

Robbie and I recorded nearly eleven hours of his life history over Zoom during five interview sessions in September 2020, during the early months of the COVID-19 pandemic. Robbie’s inspiring stories of environmental activism produced a 253-page transcript, which includes an appendix with several photographs. The stories that Robbie shared in his oral history also emphasized the incredibly high stakes for our present moment of environmental politics, rhetoric, and civic engagement.

Cox was born in September 1945, in Hinton, West Virginia, where his early influences included roaming Appalachian forests and rivers as well as his family’s history of union organizing and work toward social justice. He was recruited to the debate team at the University of Richmond where, from 1963 to 1967, he studied communication, philosophy, history, and religion while also participating in civil rights protests. In 1970, Cox earned his Ph.D. in classical rhetoric studies from the University of Pittsburgh with a dissertation on the rhetorical structures of the Vietnam antiwar movement in which he actively participated. From 1971 to 2010, Cox was a Professor in the Department of Communication at UNC-CH where he helped establish the field of environmental communication and focused his research and teaching on argumentation, rhetorical theory, and social movements. Cox married Professor Julia Wood in 1975 when she also joined the UNC-CH faculty in the Department of Communication.

Video clip from Robert Cox’s oral history on first joining the Sierra Club in 1979

Black and white photograph of Joe Grimsley wearing a flannel shirt and trucker-style baseball hat while talking to a Robbie Cox who is wearing a dark button-up shirt
Robert Cox (right) with North Carolina Secretary of Natural Resources Joe Grimsley (left) discussing what would become the North Carolina Wilderness Act of 1984.

Upon Dr. Wood’s suggestion, Cox joined the Sierra Club in 1979 and, over time, he earned leadership positions at every level in the Club: as chair of the Research Triangle Group, as chair of the North Carolina Chapter, and as an elected member to the national board of directors for most years between 1993 and 2013, including three times as president of the national Sierra Club. Cox made significant contributions to passage in the US Congress of the North Carolina Wilderness Bill, to the Sierra Club’s early engagements in the environmental justice movement, to restructuring both the Club’s internal governance and its volunteer structure, as well as helping lead Sierra Club engagements in national politics, particularly during his times as Club president. In this oral history, Cox discusses all of the above, with a focus on leveraging influential communication and strategy, while also sharing his experiences hiking and trekking in the Himalayas, in the mountains of Europe, and in the Appalachian Mountains.

Robbie Cox’s oral history is significant for detailing the environmental activism and political strategies of one of the most influential volunteers in recent Sierra Club history. Some of the themes throughout Robbie’s oral history include the profoundly democratic nature of the Sierra Club, details on the Club’s geographically diverse grassroots activism, as well as numerous ways that volunteer environmentalists work together to shape state and national legislation. Robbie also reconstructed the ways he balanced his double life as UNC professor with his life as an environmental activist, especially through his work in Sierra Club media campaigns. He recounted his decades as a nationally elected volunteer leader in the Sierra Club, as told through the perspective of an academic scholar of rhetoric and communications. And throughout, Robbie shared stories of direct action for environmental causes at all levels of Sierra Club engagement, from local to national.

Video clip from Robert Cox’s oral history on passing the North Carolina Wilderness Act in 1984

The in-depth, life-history approach used in this oral history reveals ways that Robbie’s personal influences and his engagements in the Sierra Club evolved over time. For instance, Robbie’s family history of labor activism instilled in him the power of people and the importance of social justice. Similarly, his participation on debate teams shaped substantially his education and academic work, while also playing a central role throughout his life as a political and environmental activist. Robbie’s interview also explored the Sierra Club’s and his own personal engagements with environmental justice, including his attendance at the First National People of Color Environmental Justice Leadership Summit in 1991, his leveraging of media in the national Sierra Club’s partnership with “Jesus People Against Pollution” in Mississippi, as well as his experiences on toxic tours of colonias in Matamoro, Mexico, along with other actions against the negative results of neoliberal free trade agreements.

Black and white photograph of Robert Cox wearing a coat and tie and speaking into a microphone while surrounded by environmentalists who hold Sierra Club signs
Robert Cox (center) speaking in November 1995 as Sierra Club president at the US Capitol Building while delivering to House Speaker Newt Gingrich several green bags containing copies of the Environmental Bill of Rights petition signed by more than a million Americans.

Robbie also shared insider details on several significant moments in the Sierra Club’s recent history. He recounted the Club’s severe financial crises in the 1990s that resulted in his work to reorganize the Club’s internal governance through Project Renewal as well as the Club’s volunteer structures via Project ACT. Robbie recounted his central role in the Sierra Club’s efforts to combat the de-regulatory and anti-environmental Congressional agenda in wake of Newt Gingrich’s Republican take-over of Congress in the 1990s, as well as Robbie’s personal role in securing the Sierra Club’s endorsement of Al Gore, for whom Robbie campaigned in 2000. Robbie also detailed the central role he played in the Groundswell Sierra campaign in the early 2000s to resist a take-over of the Sierra Club by anti-immigration and white supremacist forces. And as the world warms and the seas rise, Robbie discussed ways that the Sierra Club has confronted the compounding crises of climate change in the twenty-first century. Robbie’s decades of environmental activism provides a lens on ways the environmental movement has evolved over time from its early focus on wild lands, to concerns about human health, to engagement on issues of environmental justice, to the modern complexities of climate change. Robbie also reflects on the contemporary Sierra Club’s internal and external challenges in its ongoing work for equity, inclusion, and justice.

Video clip from Robert Cox’s oral history on delivering to Congress the Environmental Bill of Rights with 1.2 million signatures in 1995

Color photograph of Robert Cox talking with Albert Gore, with both men wearing a collared Polo-style shirt and pleated khaki pants
Robert Cox (left) and US Vice President Al Gore (right) in July 2000 in Grand Rapids, Michigan, delivering the Sierra Club’s public endorsement of Mr. Gore for US President during the 2000 election.

Back in the summer of 2020, when I spoke with Carl Pope, former Sierra Club executive director, to prepare for Robbie’s oral history, Pope recalled Robbie’s exceptional leadership and effectiveness.  When “Professor Cox” first won election to the national Sierra Club board of directors in the 1990s, Pope described Robbie’s presence as “immediately noticeable.” Pope told me how Robbie used his expertise in rhetoric to unify people and advance proposals for environmental action. “You could see Robbie work at a board meeting,” Pope remembered. “When he wanted to get the board to agree, he would offer some initial proposal tentatively, then let folks respond to it and let the room talk. Then he’d come back in and make the same proposal, but he changed two words to see if that worked. He’d keep playing with the proposal and make changes rhetorically, until he got something that would work for everyone.” The Sierra Club’s board of directors come increasingly from a variety of backgrounds across the United States. All directors are volunteers, not employed staff, but like much of the Sierra Club staff, many Club directors consider themselves to be full-time environmental activists. As Carl Pope noted, however, most Sierra Club directors “are not professional communicators. People would talk past each other. Robbie’s skill on the board lubricated that process, which was phenomenally helpful. If anyone wanted to get something done, you asked Robbie.” Indeed, Robbie Cox got things done.

Pope also described Robbie as a kind of environmental philosopher. “He wasn’t ideological,” Pope explained, “but surely, he had his own vision of where the Club should go.” Now, with this publication of Robbie Cox’s oral history, you too can have him tell you in his own words about his visions for the Sierra Club and the ways he mobilized constituencies to make a reality of his visions for environmental protection, political power, and justice.

ABOUT THE ORAL HISTORY CENTER

The Oral History Center of The Bancroft Library preserves voices of people from all walks of life, with varying political perspectives, national origins, and ethnic backgrounds. We are committed to open access and our oral histories and interpretive materials are available online at no cost to scholars and the public. You can find our oral histories from the search feature on our home page. Search by name, keyword, and several other criteria. Sign up for our monthly newsletter  featuring think pieces, new releases, podcasts, Q&As, and everything oral history. Access the most recent articles from our home page or go straight to our blog home.

Please consider making a tax-deductible donation to the Oral History Center if you’d like to see more work like this conducted and made freely available online. While we receive modest institutional support, we are a predominantly self-funded research unit of The Bancroft Library. We must raise the funds to cover the cost of all the work we do, including each oral history. You can give online, or contact us at ohc@berkeley.edu for more information about our funding needs for present and future projects.


Workshop Reminder — How to Publish Open Access at UC Berkeley on October 17, 2023

A presentation slide with blue background, library logo, and text about the event that reads: "How to Publish Open Access at UC Berkeley; UC Berkeley Library; Office of Scholarly Communication Services; October 17, 2023"

Date/Time: Tuesday, October 17, 2023, 11:00am–12:30pm
Location: Zoom only. Register via LibCal.

Are you wondering what processes, platforms, and funding are available at UC Berkeley to publish your research open access (OA)? This workshop will provide practical guidance and walk you through all of the OA publishing options and funding sources you have on campus. We’ll explain: the difference between (and mechanisms for) self-depositing your research in the UC’s institutional repository vs. choosing publisher-provided OA; what funding is available to put toward your article or book charges if you choose a publisher-provided option; and the difference between funding coverage under the UC’s systemwide OA agreements vs. the Library’s funding program (Berkeley Research Impact Initiative). We’ll also give you practical tips and tricks to maximize your retention of rights and readership in the publishing process.

Join us next week!

 


Workshop Reminder — Managing & Maximizing Your Scholarly Impact on October 10, 2023

A presentation slide with dark blue background, library logo, and text about the event that reads: "Managing & Maximizing Your Scholarly Impact; UC Berkeley Library; Office of Scholarly Communication Services; October 10, 2023"

Date/Time: Tuesday, October 10, 2023, 11:00am–12:30pm
Location: Hybrid: Join in person at 223 Doe Library, or on Zoom. Register via LibCal.

This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.

Join us next week!

 


Wrapping up our NEH-funded project to help text and data mining researchers navigate cross-border legal and ethical issues

Black and white photograph with grass and concrete with the word "finish" painted on the concrete in large capitalized letters.
Image via rawpixel, public domain

In August 2022, the UC Berkeley Library and Internet Archive were awarded a grant from the National Endowment for the Humanities (NEH) to study legal and ethical issues in cross-border text and data mining (TDM).

The project, entitled Legal Literacies for Text Data Mining – Cross-Border (“LLTDM-X”), supported research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or -licensed content, or involves international research collaborations.

LLTDM-X is now complete, resulting in the publication of an instructive case study for researchers and white paper. Both resources are explained in greater detail below.

Project Origins

LLTDM-X built upon the previous NEH-sponsored institute, Building Legal Literacies for Text Data Mining. That institute provided training, guidance, and strategies to digital humanities TDM researchers on navigating legal literacies for text data mining (including copyright, contracts, privacy, and ethics) within a U.S. context.

A common challenge highlighted during the institute was the fact that TDM practitioners encounter expanding and increasingly complex cross-border legal problems. These include situations in which: (i) the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) the human subjects they are studying or who created the underlying content reside in another country; or, (iii) the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply.

Project design

We designed LLTDM-X to identify and better understand the cross-border issues that digital humanities TDM practitioners face, with the aim of using these issues to inform prospective research and education. Secondarily, we hoped that LLTDM-X would also suggest preliminary guidance to include in future educational materials. In early 2023, we hosted a series of three online round tables with U.S.-based cross-border TDM practitioners and law and ethics experts from six countries. 

The round table conversations were structured to illustrate the empirical issues that researchers face, and also for the practitioners to benefit from preliminary advice on legal and ethical challenges. Upon the completion of the round tables, the LLTDM-X project team created a hypothetical case study that (i) reflects the observed cross-border LLTDM issues and (ii) contains preliminary analysis to facilitate the development of future instructional materials.

We also charged the experts with providing responsive and tailored written feedback to the practitioners about how they might address specific cross-border issues relevant to each of their projects.

Guidance & Analysis

Case Study

Extrapolating from the issues analyzed in the round tables, the practitioners’ statements, and the experts’ written analyses, the Project Team developed a hypothetical case study reflective of “typical” cross-border LLTDM issues that U.S.-based practitioners encounter. The case study provides basic guidance to support U.S. researchers in navigating cross-border TDM issues, while also highlighting questions that would benefit from further research. 

The case study examines cross-border copyright, contracts, and privacy & ethics variables across two distinct paradigms: first, a situation where U.S.-based researchers perform all TDM acts in the U.S., and second, a situation where U.S.-based researchers engage with collaborators abroad, or otherwise perform TDM acts in both U.S. and abroad.

White Paper

The LLTDM-X white paper provides a comprehensive description of the project, including origins and goals, contributors, activities, and outcomes. Of particular note are several project takeaways and recommendations, which we hope will help inform future research and action to support cross-border text data mining. Our project takeaways touched on seven key themes: 

  1. Uncertainty about cross-border LLTDM issues indeed hinders U.S. TDM researchers, confirming the need for education about cross-border legal issues; 
  2. The expansion of education regarding U.S. LLTDM literacies remains essential, and should continue in parallel to cross-border education; 
  3. Disparities in national copyright, contracts, and privacy laws may incentivize TDM researcher “forum shopping” and exacerbate research bias;
  4. License agreements (and the concept of “contractual override”) often dominate the overall analysis of cross-border TDM permissibility;
  5. Emerging lawsuits about generative artificial intelligence may impact future understanding of fair use and other research exceptions; 
  6. Research is needed into issues of foreign jurisdiction, likelihood of lawsuits in foreign countries, and likelihood of enforcement of foreign judgments in the U.S. However, the overall “risk” of proceeding with cross-border TDM research may remain difficult to quantify; and
  7. Institutional review boards (IRBs) have an opportunity to explore a new role or build partnerships to support researchers engaged in cross-border TDM.

Gratitude & Next Steps

Thank you to the practitioners, experts, project team, and generous funding of the National Endowment for the Humanities for making this project a success. 

We aim to broadly share our project outputs to continue helping U.S.-based TDM researchers navigate cross-border LLTDM hurdles. We will continue to speak publicly to educate researchers and the TDM community regarding project takeaways, and to advocate for legal and ethical experts to undertake the essential research questions and begin developing much-needed educational materials. And, we will continue to encourage the integration of LLTDM literacies into digital humanities curricula, to facilitate both domestic and cross-border TDM research.

[Note: this content is cross-posted on the LLTDM blog.]


Workshop Reminder — Copyright and Your Dissertation on September 27, 2023

A presentation slide with green background, library logo, and text about the event that reads: "Copyright (+ other Laws & Policies) & Your Dissertation; UC Berkeley Library; Office of Scholarly Communication Services; September 27, 2023"

Date/Time: Wednesday, September 27, 2023, 11:00am–12:30pm
Location: Hybrid: Join in person at 223 Doe Library, or on Zoom. Register via LibCal.

This workshop will provide you with practical guidance for navigating copyright questions and other legal considerations for your dissertation or thesis. Whether you’re just starting to write or you’re getting ready to file, you can use our tips and workflow to figure out what you can use, what rights you have as an author, and what it means to share your dissertation online.

Some questions we’ll answer during the workshop include:

  • What’s mine after I’m done writing my dissertation?
  • Can I re-use previous scholarly articles I’ve written?
  • Can I use content created by others? How?
  • Where does my dissertation end up online? When?

Join us!

 


Workshop Reminder — Publish Digital Books & Open Educational Resources with Pressbooks

A presentation slide with dark red background, library logo, and text about the event that reads: "Public digital books and open educational resources with Pressbooks. Berkeley Library, Timothy Vollmer, Scholarly Communication + Copyright Librarian, Office of Scholarly Communication Services, September 20, 2023."

Date/Time: Wednesday, September 20, 2023, 11:00am–12:30pm
Location: Zoom only. Register via LibCal and you’ll receive the Zoom link for the event.

If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way.

Curious about how UC Berkeley faculty, students, and staff have used Pressbooks? Check out some of the Berkeley-created digital books and resources below, or browse over 5,700 open access books on the Pressbooks Directory.

Six book covers from Pressbooks created by UC Berkeley faculty, students, and staff.


Fall 2023 copyright and publishing workshops with the Office of Scholarly Communication Services

Graphic with blue background, the Office of Scholarly Communications Services logo, and text as follows: "Office of Scholarly Communication Services, Fall 2023 Workshops: Publish Digital Books & OERs with Pressbooks; Copyright and Your Dissertation, Managing and Maximizing Your Scholarly Impact, How to Publish Open Access at UC Berkeley, From Dissertation to Book: Navigating the Publication Process."

With the school year kicking off soon in Berkeley, the Library’s Office of Scholarly Communication Services is here to help UC Berkeley faculty, students, and staff understand copyright and scholarly publishing with online resources, Zoom and in-person workshops, and consultations. Here’s what’s coming up this semester.

Workshops

Publish Digital Books & Open Educational Resources with Pressbooks

Date/Time: Wednesday, September 20, 2023, 11:00am–12:30pm
Location: Zoom only. Register via LibCal.

If you’re looking to self-publish work of any length and want an easy-to-use tool that offers a high degree of customization, allows flexibility with publishing formats (EPUB, PDF), and provides web-hosting options, Pressbooks may be great for you. Pressbooks is often the tool of choice for academics creating digital books, open textbooks, and open educational resources, since you can license your materials for reuse however you desire. Learn why and how to use Pressbooks for publishing your original books or course materials. You’ll leave the workshop with a project already under way.

Copyright and Your Dissertation

Date/Time: Wednesday, September 27, 2023, 11:00am–12:30pm
Location: In-person in Doe Library Room 223, or Zoom. Register via LibCal.

This workshop will provide you with practical guidance for navigating copyright questions and other legal considerations for your dissertation or thesis. Whether you’re just starting to write or you’re getting ready to file, you can use our tips and workflow to figure out what you can use, what rights you have as an author, and what it means to share your dissertation online.

Managing and Maximizing Your Scholarly Impact

Date/Time: Tuesday, October 10, 2023, 11:00am–12:30pm
Location: In-person in Doe Library Room 223, or Zoom. Register via LibCal.

This workshop will provide you with practical strategies and tips for promoting your scholarship, increasing your citations, and monitoring your success. You’ll also learn how to understand metrics, use scholarly networking tools, and evaluate journals and publishing options.

How to Publish Open Access at UC Berkeley

Date/Time: Tuesday, October 17, 2023, 11:00am–12:30pm
Location: Zoom only. Register via LibCal.

Are you wondering what processes, platforms, and funding are available at UC Berkeley to publish your research open access (OA)? This workshop will provide practical guidance and walk you through all of the OA publishing options and funding sources you have on campus. We’ll explain: the difference between (and mechanisms for) self-depositing your research in the UC’s institutional repository vs. choosing publisher-provided OA; what funding is available to put toward your article or book charges if you choose a publisher-provided option; and the difference between funding coverage under the UC’s systemwide OA agreements vs. the Library’s funding program (Berkeley Research Impact Initiative). We’ll also give you practical tips and tricks to maximize your retention of rights and readership in the publishing process.

From Dissertation to Book: Navigating the Publication Process

Date/Time: Thursday, November 9, 2023, 11:00am–12:30pm
Location: Zoom only. Register via LibCal.

Hear from a panel of experts—an acquisitions editor, a first-time book author, and an author rights expert—about the process of turning your dissertation into a book. You’ll come away from this panel discussion with practical advice about revising your dissertation, writing a book proposal, approaching editors, signing your first contract, and navigating the peer review and publication process.

Other ways we can help

In addition to the workshops, we’re here to help answer a variety of questions you might have on intellectual property, digital publishing, and information policy. 

Want help or more information? Send us an email. We can provide individualized support and personal consultations, online class instruction, presentations and workshops for small or large groups & classes, and customized support and training for departments and disciplines.


UC Berkeley author tips: What to do when you have to pay an open access publishing fee

This post  provides information to UC Berkeley authors about programs that our Library and the UC system offer to help defray open access article processing charges. It also offers tips about how to plan or budget in advance for these fees when possible. 

The University of California has been a long-time supporter of open access publishing—that is, making peer-reviewed scholarship available online without any financial, legal, or technical barriers. Just because the publishing outcome is open to be read at no cost, though, doesn’t mean the publishing enterprise as a whole is “free.” One of the most common ways for open access publishers to continue to finance their publishing and production of journals in the absence of selling subscriptions for access is to instead charge authors a fee to publish—moving from a publishing system based on paying to read to one based on paying to publish. Of course, not all methods of funding open access require authors to pay publication fees in this way. And in all cases (except those rare instances in which a publisher requests that you waive this right), the UC’s open access policy makes it possible for UC authors to share their author-accepted manuscript version of their articles on eScholarship, the UC’s research repository, immediately upon publication in a journal. 

But when a publisher does charge a fee to publish, we want to help you understand what UC Berkeley resources are available—whether from your grant funds or the University of California Libraries—to help with those costs.

Typically publishers refer to author-facing fees  as “article processing charges”, or “APCs”. APCs can range from a few hundred dollars all the way up to $10,000 or more for some select Nature journals. 

UC authors may be able to cover or contribute to these fees by leveraging research accounts or grant funds (to the extent available). But there are also other University Library programs available to support payment when research accounts or grant funds are not available.

UC-wide open access publishing agreements will cover some (or all) APCs

UC corresponding authors can take advantage of funding opportunities to defray the cost of publishing their scholarship open access where their grant or other research funds come up short or are not available. The University of California libraries have entered into a growing number of systemwide transformative open access agreements with publishers. UC libraries’ transformative agreements aim to transform scholarly publishing by moving from a publication model based on subscription access to an open access model. 

When a UC-affiliated corresponding author has an article accepted for publication in a journal with which the UC has an open access publishing agreement, the UC libraries will pay some or all of the associated publishing fee. So, when it comes time to pay the APC, the UC libraries will pay at least the first $1,000. If there’s any remaining balance due on the APC, the publisher’s payment system will ask if the UC author has grant funding available to cover the remainder. If the UC author cannot contribute the remaining balance, the UC libraries will pay the entire APC on their behalf. (Note: there are a few instances where the UC libraries will contribute a maximum of $1,000 toward the APC, such as Nature-branded titles.)

The UC maintains an updated list of Publisher OA Agreements and Discounts where you can explore which journals are available for partial or full APC coverage under the open access agreements.

The UC Berkeley Library-specific fund can reimburse open access fees for other fully open access journals

UC Berkeley’s Library also has a campus open access fund that UCB authors can use if they are publishing in a fully open access journal and are required to pay an APC. The Berkeley Research Impact Initiative (BRII) is open to any current UC Berkeley faculty, graduate student, postdoc, or academic staff who does not have other sources of funds to pay article processing charges. The BRII fund is available for journals other than those with which the UC has entered into a systemwide transformative open access agreement. 

For BRII APC coverage to apply, the entire journal must be freely available to the public without subscription fees. BRII cannot cover fees for publishing in “hybrid” OA journals—which are subscription-based journals that only offer open access options if an author decides to pay an additional fee to make their individual article open access. BRII reimbursements are capped at $2,500 per article, and a UC Berkeley author can use BRII funds once per fiscal year. 

How to plan in advance 

If your research is grant funded, it is important to think about publishing costs at the beginning of your research cycle and account for them in your grant applications and annual research budgeting. For grant recipients (such as researchers with funding from NIH, NSF, etc.), open access publishing costs generally are considered an allowable direct expense unless funders explicitly prohibit them. For more information on how and why to plan in advance, check out the Open Access Fact Sheet for Researchers Applying for Grants.

First page image of the "Open Access Fact Sheet for Researchers Applying for Grants" infosheet. The image is linked to the guide at https://guides.lib.berkeley.edu/ld.php?content_id=72496589
Click the image above to view the full guide “Open Access Fact Sheet for Researchers Applying for Grants”

Planning in advance allows you to be a partner in the publishing process. It allows the UC libraries to cover some of your article processing charge ($1,000) and, where possible, you to use grant or research funds to cover the rest. The more researchers are able to contribute, the farther the UC agreements can go in publishing more articles open access, and the better UC libraries are able to help provide financial support to researchers who do not have specific access to grant funds. 

Most of the UC transformative open access agreements are set up to cover the full article processing charge should UC authors not have research or grant funds to contribute to making their journal articles open access. But there are a few journal titles and series within transformative agreements for which the libraries were unable to negotiate full coverage. For example, if a UC author has an article accepted in Nature Communications, the UC libraries cover only the first $1,000 of the article processing charge through the terms of the UC-Springer Nature transformative open access agreement. Since the current APC for Nature Communications is $6,290, then the UC author must pay the remainder of the fee ($5,290). 

Another instance in which an author may need to pay a balance is when the author is publishing in a fully-open access journal not covered by a transformative agreement at all, and in turn when that journal’s article processing charge exceeds what can be covered through the BRII program. For instance, if a UC Berkeley author has an article accepted for publication in JAMA Network Open, the BRII program is capped at covering $2,500 of the article processing charge. Since the APC for JAMA Network Open is $3,000, then the UC Berkeley author must pay the remainder of the fee ($500). 

Since both of the examples above are journals in which an APC is required in order to publish there, authors are responsible for securing the remainder of any publishing fees should the open access publication costs exceed the amount of UC libraries (or UC Berkeley Library’s) support. 

Need more help?


Come Help Us Create Wikipedia and Create Change, Edit by Edit, on February 15, 2023!

Screenshot of Wikipedia Entry for the Movie Tár 1-20-23
Screenshot of Wikipedia Entry for the Movie Tár 1-20-23

Wikipedia has become so central to our lives that we count on it to represent reality, and solid fact. When we encounter a new phenomenon, we check out our trusty online friend for more information.  So, it was fascinating to me recently to see the lines blur between fiction and reality, when Wikipedia was used as a visual and social cue in the movie Tár, starring Cate Blanchett, about a famed female conductor.  In the movie, one of the clues to the coming turbulence in Lydia Tár’s life is a screen capture of a mystery editor changing items on the conductor’s Wikipedia entry. It looked and felt so real, the filming and Blanchett’s performance so rivetingly vivid, that many people believed the film was a biopic of a real person.   As Brooke LaMantia wrote in her article, No, Lydia Tar is Not Real,

“When I left the theater after watching Tár for two hours and 38 minutesI immediately fumbled for my phone. I couldn’t wait to see actual footage of the story I had just seen and was so ready for my Wikipedia deep dive to sate me during my ride home. But when I frantically typed “Lydia Tar?” into Google as I waited for my train, I was greeted with a confusing and upsetting realization: Lydia Tár is not real…the film’s description on Letterboxd — “set in the international world of classical music, centers on Lydia Tár, widely considered one of the greatest living composer/conductors and first-ever female chief conductor of a major German orchestra” — is enough to make you believe Tár is based on a true story. The description was later added to a Wikipedia page dedicated to “Lydia Tár,” but ahead of the film’s October 28 wide release, that page has now been placed under a broader page for the movie as a whole. Was this some sort of marketing sleight of hand or just a mistake I stumbled upon? Am I the only one who noticed this? I couldn’t be, right? I thought other people had to be stuck in that same cycle of questioning: Wait, this has to be real. Or is it? She’s not a real person?

Wikipedia is central to LaMantia’s questioning!  While it’s easy to understand people’s confusion in general, the Tár Wikipedia page, created by editors like you and like me, is very clear that this is a film, at least as of today’s access date, January 20, 2023… On the other hand, did you know you can click on the “View History” link on the page, and see every edit that has been made to it, since it was created, and who made that edit?  If you look at the page resulting from one of the edits from October 27, 2022, you can see that it does look like Tár is a real person, and in fact, a person who later went on to edit this entry to make it clearer wrote, “Reading as it was, it is not clear if Lydia actually exists.”  Maybe I should write to LaMantia and let her know.

I tell this story to show that clearly, Wikipedia is a phenomenon, and a globally central one, which makes it all the more amazing that it is created continuously, edit by edit, editor by editor.  There are many ways in which our own and your own edits can create change, lead to social justice, correct misinformation and more.  While it’s easy to get lost in the weeds of minute changes to esoteric entries, it’s also possible to improve pages on important figures in real-life history and bring them into our modern narrative and consciousness.  And it’s easy to do!

If you are interested in learning more, and being part of this central resource, we warmly welcome you and invite you to join us on Wednesday, February 15, from 1-2:30 for our 2023 Wikipedia Editathon, part of the University of Calif0rnia-wide 2023 Love Data Week.  No experience is required—we will teach you all you need to know about editing!  (but, if you want to edit with us in real time, please create a Wikipedia account before the workshop).  The link to register is here, and you can contact any of the workshop leaders (listed on the registration page) with questions.  We look forward to editing with you!