Wrapping up our NEH-funded project to help text and data mining researchers navigate cross-border legal and ethical issues

Black and white photograph with grass and concrete with the word "finish" painted on the concrete in large capitalized letters.
Image via rawpixel, public domain

In August 2022, the UC Berkeley Library and Internet Archive were awarded a grant from the National Endowment for the Humanities (NEH) to study legal and ethical issues in cross-border text and data mining (TDM).

The project, entitled Legal Literacies for Text Data Mining – Cross-Border (“LLTDM-X”), supported research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or -licensed content, or involves international research collaborations.

LLTDM-X is now complete, resulting in the publication of an instructive case study for researchers and white paper. Both resources are explained in greater detail below.

Project Origins

LLTDM-X built upon the previous NEH-sponsored institute, Building Legal Literacies for Text Data Mining. That institute provided training, guidance, and strategies to digital humanities TDM researchers on navigating legal literacies for text data mining (including copyright, contracts, privacy, and ethics) within a U.S. context.

A common challenge highlighted during the institute was the fact that TDM practitioners encounter expanding and increasingly complex cross-border legal problems. These include situations in which: (i) the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) the human subjects they are studying or who created the underlying content reside in another country; or, (iii) the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply.

Project design

We designed LLTDM-X to identify and better understand the cross-border issues that digital humanities TDM practitioners face, with the aim of using these issues to inform prospective research and education. Secondarily, we hoped that LLTDM-X would also suggest preliminary guidance to include in future educational materials. In early 2023, we hosted a series of three online round tables with U.S.-based cross-border TDM practitioners and law and ethics experts from six countries. 

The round table conversations were structured to illustrate the empirical issues that researchers face, and also for the practitioners to benefit from preliminary advice on legal and ethical challenges. Upon the completion of the round tables, the LLTDM-X project team created a hypothetical case study that (i) reflects the observed cross-border LLTDM issues and (ii) contains preliminary analysis to facilitate the development of future instructional materials.

We also charged the experts with providing responsive and tailored written feedback to the practitioners about how they might address specific cross-border issues relevant to each of their projects.

Guidance & Analysis

Case Study

Extrapolating from the issues analyzed in the round tables, the practitioners’ statements, and the experts’ written analyses, the Project Team developed a hypothetical case study reflective of “typical” cross-border LLTDM issues that U.S.-based practitioners encounter. The case study provides basic guidance to support U.S. researchers in navigating cross-border TDM issues, while also highlighting questions that would benefit from further research. 

The case study examines cross-border copyright, contracts, and privacy & ethics variables across two distinct paradigms: first, a situation where U.S.-based researchers perform all TDM acts in the U.S., and second, a situation where U.S.-based researchers engage with collaborators abroad, or otherwise perform TDM acts in both U.S. and abroad.

White Paper

The LLTDM-X white paper provides a comprehensive description of the project, including origins and goals, contributors, activities, and outcomes. Of particular note are several project takeaways and recommendations, which we hope will help inform future research and action to support cross-border text data mining. Our project takeaways touched on seven key themes: 

  1. Uncertainty about cross-border LLTDM issues indeed hinders U.S. TDM researchers, confirming the need for education about cross-border legal issues; 
  2. The expansion of education regarding U.S. LLTDM literacies remains essential, and should continue in parallel to cross-border education; 
  3. Disparities in national copyright, contracts, and privacy laws may incentivize TDM researcher “forum shopping” and exacerbate research bias;
  4. License agreements (and the concept of “contractual override”) often dominate the overall analysis of cross-border TDM permissibility;
  5. Emerging lawsuits about generative artificial intelligence may impact future understanding of fair use and other research exceptions; 
  6. Research is needed into issues of foreign jurisdiction, likelihood of lawsuits in foreign countries, and likelihood of enforcement of foreign judgments in the U.S. However, the overall “risk” of proceeding with cross-border TDM research may remain difficult to quantify; and
  7. Institutional review boards (IRBs) have an opportunity to explore a new role or build partnerships to support researchers engaged in cross-border TDM.

Gratitude & Next Steps

Thank you to the practitioners, experts, project team, and generous funding of the National Endowment for the Humanities for making this project a success. 

We aim to broadly share our project outputs to continue helping U.S.-based TDM researchers navigate cross-border LLTDM hurdles. We will continue to speak publicly to educate researchers and the TDM community regarding project takeaways, and to advocate for legal and ethical experts to undertake the essential research questions and begin developing much-needed educational materials. And, we will continue to encourage the integration of LLTDM literacies into digital humanities curricula, to facilitate both domestic and cross-border TDM research.

[Note: this content is cross-posted on the LLTDM blog.]


Upcoming Workshop: Can I Mine That? Should I Mine That? A Clinic for Copyright, Ethics & More in TDM Research

computer keyboard and mouse with title of the Digital publishing Workshop Series

Workshop Date/Time: Wednesday, March 8, 2023, 11:00am–12:30pm

Register to receive Zoom link

If you are working on a computational text analysis project and have wondered how to legally acquire, use, and publish text and data, this workshop is for you! We will teach you 5 legal literacies (copyright, contracts, privacy, ethics, and special use cases) that will empower you to make well-informed decisions about compiling, using, and sharing your corpus. By the end of this workshop, and with a useful checklist in hand, you will be able to confidently design lawful text analysis projects or be well positioned to help others design such projects. Consider taking alongside Copyright and Fair Use for Digital Projects.

Please sign up today and join us online on March 8.


Event: Can We Afford Privacy from Surveillance? Do We Want To?

CAN WE AFFORD PRIVACY FROM SURVEILLANCE? DO WE WANT TO?
Speaker: Jeffrey MacKie-Mason
Wednesday, September 23, 2015, 4:10 pm – 5:30 pm
202 South Hall

The extent to which we are subject to surveillance — the collection of information about us, by government, commercial, or individual agents — is in large part an economic question. Surveillance takes effort and resources — spend more and we can do better surveillance. Protecting against surveillance also takes effort and resources. Given the state of technology, the amount of effort and money each side expends determines what is surveilled and what is kept private. As technology changes, both the cost and the desirability of surveillance, and protection against surveillance, change. We can confidently predict that information technology and communication costs will continue to decrease, and capabilities to surveil and protect against it will improve. What are the consequences for our privacy? Will we have a future with more or less privacy? Which do we want?

Bio: Jeffrey MacKie-Mason will be joining UC Berkeley on October 1 as University Librarian and Chief Digital Scholarship Officer. For the past 29 years, Jeff has been a faculty member at the University of Michigan, where he was the Arthur W. Burks Collegiate Professor of Information and Computer Science, and also a professor of economics and a professor of public policy. For the last five years he has been the dean of Michigan’s School of Information. Jeff has been a pioneering scholar of the economics of the Internet and online behavior and a frequent co-author with the Berkeley I School’s first dean, Hal Varian. He has also led the development of the incentive-centered design approach to online information services.