In August 2022, the UC Berkeley Library and Internet Archive were awarded a grant from the National Endowment for the Humanities (NEH) to study legal and ethical issues in cross-border text and data mining (TDM).
The project, entitled Legal Literacies for Text Data Mining – Cross-Border (“LLTDM-X”), supported research and analysis to address law and policy issues faced by U.S. digital humanities practitioners whose text data mining research and practice intersects with foreign-held or -licensed content, or involves international research collaborations.
LLTDM-X built upon the previous NEH-sponsored institute, Building Legal Literacies for Text Data Mining. That institute provided training, guidance, and strategies to digital humanities TDM researchers on navigating legal literacies for text data mining (including copyright, contracts, privacy, and ethics) within a U.S. context.
A common challenge highlighted during the institute was the fact that TDM practitioners encounter expanding and increasingly complex cross-border legal problems. These include situations in which: (i) the materials they want to mine are housed in a foreign jurisdiction, or are otherwise subject to foreign database licensing or laws; (ii) the human subjects they are studying or who created the underlying content reside in another country; or, (iii) the colleagues with whom they are collaborating reside abroad, yielding uncertainty about which country’s laws, agreements, and policies apply.
We designed LLTDM-X to identify and better understand the cross-border issues that digital humanities TDM practitioners face, with the aim of using these issues to inform prospective research and education. Secondarily, we hoped that LLTDM-X would also suggest preliminary guidance to include in future educational materials. In early 2023, we hosted a series of three online round tables with U.S.-based cross-border TDM practitioners and law and ethics experts from six countries.
The round table conversations were structured to illustrate the empirical issues that researchers face, and also for the practitioners to benefit from preliminary advice on legal and ethical challenges. Upon the completion of the round tables, the LLTDM-X project team created a hypothetical case study that (i) reflects the observed cross-border LLTDM issues and (ii) contains preliminary analysis to facilitate the development of future instructional materials.
We also charged the experts with providing responsive and tailored written feedback to the practitioners about how they might address specific cross-border issues relevant to each of their projects.
Guidance & Analysis
Extrapolating from the issues analyzed in the round tables, the practitioners’ statements, and the experts’ written analyses, the Project Team developed a hypothetical case study reflective of “typical” cross-border LLTDM issues that U.S.-based practitioners encounter. The case study provides basic guidance to support U.S. researchers in navigating cross-border TDM issues, while also highlighting questions that would benefit from further research.
The case study examines cross-border copyright, contracts, and privacy & ethics variables across two distinct paradigms: first, a situation where U.S.-based researchers perform all TDM acts in the U.S., and second, a situation where U.S.-based researchers engage with collaborators abroad, or otherwise perform TDM acts in both U.S. and abroad.
The LLTDM-X white paper provides a comprehensive description of the project, including origins and goals, contributors, activities, and outcomes. Of particular note are several project takeaways and recommendations, which we hope will help inform future research and action to support cross-border text data mining. Our project takeaways touched on seven key themes:
- Uncertainty about cross-border LLTDM issues indeed hinders U.S. TDM researchers, confirming the need for education about cross-border legal issues;
- The expansion of education regarding U.S. LLTDM literacies remains essential, and should continue in parallel to cross-border education;
- Disparities in national copyright, contracts, and privacy laws may incentivize TDM researcher “forum shopping” and exacerbate research bias;
- License agreements (and the concept of “contractual override”) often dominate the overall analysis of cross-border TDM permissibility;
- Emerging lawsuits about generative artificial intelligence may impact future understanding of fair use and other research exceptions;
- Research is needed into issues of foreign jurisdiction, likelihood of lawsuits in foreign countries, and likelihood of enforcement of foreign judgments in the U.S. However, the overall “risk” of proceeding with cross-border TDM research may remain difficult to quantify; and
- Institutional review boards (IRBs) have an opportunity to explore a new role or build partnerships to support researchers engaged in cross-border TDM.
Gratitude & Next Steps
Thank you to the practitioners, experts, project team, and generous funding of the National Endowment for the Humanities for making this project a success.
We aim to broadly share our project outputs to continue helping U.S.-based TDM researchers navigate cross-border LLTDM hurdles. We will continue to speak publicly to educate researchers and the TDM community regarding project takeaways, and to advocate for legal and ethical experts to undertake the essential research questions and begin developing much-needed educational materials. And, we will continue to encourage the integration of LLTDM literacies into digital humanities curricula, to facilitate both domestic and cross-border TDM research.
[Note: this content is cross-posted on the LLTDM blog.]