Wrapping up our NEH-funded project to help text and data mining researchers navigate cross-border legal and ethical issues

Black and white photograph with grass and concrete with the word "finish" painted on the concrete in large capitalized letters.
Image via rawpixel, public domain

In August 2022, the UC Berkeley Library and Internet Archive were awarded a grant from the National Endowment for the Humanities (NEH) to study legal and ethical issues in cross-border text and data mining (TDM).

The project, entitled Legal Literacies for Text Data Mining – Cross-Border (“LLTDM-X”), supported research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or -licensed content, or involves international research collaborations.

LLTDM-X is now complete, resulting in the publication of an instructive case study for researchers and white paper. Both resources are explained in greater detail below.

Project Origins

LLTDM-X built upon the previous NEH-sponsored institute, Building Legal Literacies for Text Data Mining. That institute provided training, guidance, and strategies to digital humanities TDM researchers on navigating legal literacies for text data mining (including copyright, contracts, privacy, and ethics) within a U.S. context.

A common challenge highlighted during the institute was the fact that TDM practitioners encounter expanding and increasingly complex cross-border legal problems. These include situations in which: (i) the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) the human subjects they are studying or who created the underlying content reside in another country; or, (iii) the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply.

Project design

We designed LLTDM-X to identify and better understand the cross-border issues that digital humanities TDM practitioners face, with the aim of using these issues to inform prospective research and education. Secondarily, we hoped that LLTDM-X would also suggest preliminary guidance to include in future educational materials. In early 2023, we hosted a series of three online round tables with U.S.-based cross-border TDM practitioners and law and ethics experts from six countries. 

The round table conversations were structured to illustrate the empirical issues that researchers face, and also for the practitioners to benefit from preliminary advice on legal and ethical challenges. Upon the completion of the round tables, the LLTDM-X project team created a hypothetical case study that (i) reflects the observed cross-border LLTDM issues and (ii) contains preliminary analysis to facilitate the development of future instructional materials.

We also charged the experts with providing responsive and tailored written feedback to the practitioners about how they might address specific cross-border issues relevant to each of their projects.

Guidance & Analysis

Case Study

Extrapolating from the issues analyzed in the round tables, the practitioners’ statements, and the experts’ written analyses, the Project Team developed a hypothetical case study reflective of “typical” cross-border LLTDM issues that U.S.-based practitioners encounter. The case study provides basic guidance to support U.S. researchers in navigating cross-border TDM issues, while also highlighting questions that would benefit from further research. 

The case study examines cross-border copyright, contracts, and privacy & ethics variables across two distinct paradigms: first, a situation where U.S.-based researchers perform all TDM acts in the U.S., and second, a situation where U.S.-based researchers engage with collaborators abroad, or otherwise perform TDM acts in both U.S. and abroad.

White Paper

The LLTDM-X white paper provides a comprehensive description of the project, including origins and goals, contributors, activities, and outcomes. Of particular note are several project takeaways and recommendations, which we hope will help inform future research and action to support cross-border text data mining. Our project takeaways touched on seven key themes: 

  1. Uncertainty about cross-border LLTDM issues indeed hinders U.S. TDM researchers, confirming the need for education about cross-border legal issues; 
  2. The expansion of education regarding U.S. LLTDM literacies remains essential, and should continue in parallel to cross-border education; 
  3. Disparities in national copyright, contracts, and privacy laws may incentivize TDM researcher “forum shopping” and exacerbate research bias;
  4. License agreements (and the concept of “contractual override”) often dominate the overall analysis of cross-border TDM permissibility;
  5. Emerging lawsuits about generative artificial intelligence may impact future understanding of fair use and other research exceptions; 
  6. Research is needed into issues of foreign jurisdiction, likelihood of lawsuits in foreign countries, and likelihood of enforcement of foreign judgments in the U.S. However, the overall “risk” of proceeding with cross-border TDM research may remain difficult to quantify; and
  7. Institutional review boards (IRBs) have an opportunity to explore a new role or build partnerships to support researchers engaged in cross-border TDM.

Gratitude & Next Steps

Thank you to the practitioners, experts, project team, and generous funding of the National Endowment for the Humanities for making this project a success. 

We aim to broadly share our project outputs to continue helping U.S.-based TDM researchers navigate cross-border LLTDM hurdles. We will continue to speak publicly to educate researchers and the TDM community regarding project takeaways, and to advocate for legal and ethical experts to undertake the essential research questions and begin developing much-needed educational materials. And, we will continue to encourage the integration of LLTDM literacies into digital humanities curricula, to facilitate both domestic and cross-border TDM research.

[Note: this content is cross-posted on the LLTDM blog.]


Upcoming Workshop: Can I Mine That? Should I Mine That? A Clinic for Copyright, Ethics & More in TDM Research

computer keyboard and mouse with title of the Digital publishing Workshop Series

Workshop Date/Time: Wednesday, March 8, 2023, 11:00am–12:30pm

Register to receive Zoom link

If you are working on a computational text analysis project and have wondered how to legally acquire, use, and publish text and data, this workshop is for you! We will teach you 5 legal literacies (copyright, contracts, privacy, ethics, and special use cases) that will empower you to make well-informed decisions about compiling, using, and sharing your corpus. By the end of this workshop, and with a useful checklist in hand, you will be able to confidently design lawful text analysis projects or be well positioned to help others design such projects. Consider taking alongside Copyright and Fair Use for Digital Projects.

Please sign up today and join us online on March 8.


Law & ethics in research and archiving social media of Myanmar resistance

On March 9, 2021, the Center for Southeast Asian Studies, Institute of East Asian Studies, the Institute of South Asia Studies, and the Human Rights Center at UC Berkeley hosted the online symposium Scholar-Activism and the Myanmar Resistance. The event invited scholar-activists to analyze and strategize for resistance to Myanmar’s military coup. The Office of Scholarly Communication Services collaborated with Dr. Hilary Faxon, Ciriacy-Wantrup Postdoctoral Fellow at UC Berkeley, to organize an afternoon workshop to explore the law, ethics, methods, and goals of archiving social media coverage of the coup.

Faxon highlighted that in the months since the military seized power on February 1, the internet has become a key domain of struggle in Myanmar. The military has cut off internet access and (before being banned) used Facebook to disseminate misinformation. Meanwhile, democracy activists have used social media alongside traditional tactics of street protests and general strikes to resist the regime.

The workshop brought together a diverse group of participants from across and beyond campus with perspectives from human rights, research and journalism, including WITNESS and Berkeley’s Human Rights Investigation Lab. Stacy Reardon, Literatures and Digital Humanities Librarian, discussed services and workshops offered by Digital Humanities at Berkeley, as well as tools used to conduct DH research, such as the Wayback Machine, Conifer, 4k download, Adobe Bridge, and others. 

The Office of Scholarly Communication Services provided an overview for how to navigate law and policy issues when researchers are scraping, archiving, or text mining third party content, like social media posts, website text or images, or articles from databases. We addressed common issues that arise in research and archiving, including copyright, license agreements and website terms of use, privacy questions, and ethical considerations. 

Workshop discussions were centered around a commitment to a shared ethics of care approach to using, sharing, and archiving information social media content related to the coup. The ethics of care framework suggests that what we do as information collectors or analyzers will affect other people, particularly when people have less structural power, and according to the ethics of care, we should care about that. This becomes immediately apparent when deciding whether or how to collect, process, and share potentially sensitive social media posts, images, and videos from the Myanmar coup, especially when doing so could have dire consequences for activists who are the subjects of those posts. 

During the workshop, we talked about how the Library has adopted a form of ethics of care in our approach to making decisions about what collection materials we’ll digitize and put online. Our version of ethics of care is framed as a balancing principle: that is, we look to whether the value to researchers, the public, or cultural communities in digitizing and sharing the content outweighs the potential for harm or exploitation of people, resources, or knowledge.

Several takeaways emerged by the end of the workshop discussion:  

  • Protecting and defending human rights: Archiving material from social media—including videos, photos, and live streams—might help ensure perpetrators of violence are held accountable, but the production and circulation of such materials can also be highly-incriminating for media creators and platform users.
  • Collecting is collaborative: Usage of archives is bound up with the intentions of those creating material, and so archiving requires an ongoing, bi-directional conversation between those creating content and those doing the archiving.
  • Circumstances change: Both ethical and organizational approaches should be discussed and decided in advance of archiving. But expect situations to change – what is safe and straightforward to keep today may be more risky tomorrow.
  • Capturing versus sharing: These are different processes, and “archiving” does not necessarily have to entail both. The benefits and risks associated with collecting data are distinct from those associated with sharing data or making it publicly available, so these processes should be considered separately.
  • Law and ethics: Regardless of what is allowed under U.S. copyright law, there may be other contracts and terms of service that restrict what you can do with materials. Moreover, collecting voluntarily-released data may not violate legal privacy rights, but may present ethical questions.
  • Data security: Develop a Data Management Plan that addresses organization and protection both during archiving, and after the project is completed. Consider a special purpose account for collaborations and data sharing.
  • Data hygiene: Don’t collect more than you need.
  • Practical strategies: Tools may depend on the specific goals of a researcher and the scale of the project. It is important to ask what, precisely, you mean when you say “archiving,” and what the purpose of creating your archive might be.
  • Seek out a community of practice to support and situate your efforts.

We hope the workshop helped researchers to better understand the legal and ethical considerations in collecting, processing, and sharing potentially sensitive social media content of events like the Myanmar resistance. The Library and a broad community of supporters are here to help scholars address these challenges and equip them to proceed with confidence, care, and sound practices. 


From the Director: October 2018

OHC Director Martin Meeker shares his work with the Oral History Association to update its core documents outlining best practices and ethical standards for the field. The committee, which Meeker is a part of, is seeking feedback through  which is open for public comment through October 12, 2018. 

Every decade or so, the Oral History Association (OHA) has convened a group of oral historians to examine, reconsider, and, often, redraft its core documents outlining best practices and ethical standards for the field. When Todd Moye assumed the presidency of OHA last fall, he announced that just such a project would be a key feature of his term. Soon a task force of fourteen members, including the excellent chairs Sarah Milligan and Troy Reeves, was established and a series of online meetings commenced. I was honored to be asked to serve on the task force and was very happy to work alongside so many accomplished scholars and dedicated oral historians.

Working fairly intensely for about nine months, the task force ultimately drafted six documents. Of those six, four are key. These include: Core Principles, Statement on Ethics, Best Practices, and what the committee is calling “For Participants in Oral History Interviews.” All of the documents are available for everyone to read online and the comment period remains open until October 12. Members of OHA will have the chance to give an up or down vote on the proposed new documents at the business meeting during upcoming OHA annual meeting on Saturday October 13.

As a member of the task force and as a deeply committed oral historian, I want to encourage everyone to engage with these documents both now and when, presumably, they are adopted. Unlike some previous iterations of these documents, the 2018 editions basically offer a full scale rethinking and rewrite of what came before. While there was much useful and insightful material in the previous versions and they served the organization well for years, many task force members thought that those documents both attempted to do “too much” and “too little.” I think that means that there were some pretty detailed prescriptions that were difficult to apply widely (“too much”) and yet much of what was written was a bit too vague and thus was difficult to implement in specific settings (“too little”). The current task force sought to remedy this, and we certainly hope that readers today agree.

The task force wrestled with a number of other questions that are either new or have become newly important over the past decade (the current version was adopted in October 2009). Not surprisingly, technology is at the top of the list. One way in which we attempted to deal with continuous technological innovation was to think about the universal questions and issues that the new innovations have summoned. In other words, we avoided getting into the weeds and writing specific instructions for the situation today because we know things will continue to evolve, and at a rapid rate. Although oral historians have long been aware of the potential challenges and needs that come with interviewing across lines of difference, there is certainly a greater sensitivity to “privilege” today, and the task force kept these concerns foremost when doing our work. But as with technology, we attempted to be open and not write the document so that it speaks only to one type of difference, privilege, or associated challenge, and instead provided guidelines and insight into the best way to handle sensitive relationships in a variety of situations.

When you read the documents, I encourage you to read first Sherna Berger Gluck’s “Introduction,” which provides a useful and tidy history of these documents over the decades, thus putting the newest versions in context. I think I can speak for my fellow task members in saying that we hope the work that we’ve done is received well and is seen as useful and valuable for, perhaps, the next 10 years.

 

Martin Meeker

Charles B. Faulhaber Director