International Collaboration (VŠE Prague, UT Austin, UC Berkeley) Builds Agentic AI System for CIA FOIA Archives

International Collaboration (VŠE Prague, UT Austin, UC Berkeley) Builds Agentic AI System for CIA FOIA Archives

Prague / Austin / Berkeley — A new international research collaboration has developed and tested a multi-stage “agentic AI” system capable of extracting structured historical knowledge from large, unstructured digital archives. Using declassified CIA documents as a case study, the research demonstrates how artificial intelligence can help transform thousands of pages of scanned archival material into a coherent, time-resolved narrative, making Cold War-era intelligence reporting significantly more accessible for the wider public.

The study, published in The Electronic Library, focuses on one of the most dramatic turning points in modern European history: the Prague Spring reforms and the subsequent Soviet-led invasion of Czechoslovakia in 1968. By applying AI-driven processing to the CIA’s FOIA Electronic Reading Room, the team shows how today’s large language models (LLMs) can support the systematic reconstruction of historical reporting, while also highlighting the continued need for expert human oversight to preserve nuance, accuracy, and interpretive integrity.

This makes the research immediately useful not only for historians, but also for institutions deciding how to deploy AI responsibly at scale: libraries, archives, universities, and even public sector organizations managing large document collections.

A Collaboration Across Three Institutions and Disciplines

The project brings together expertise from three leading academic environments:

Prague University of Economics and Business contributed primarily to the design of the agentic workflow, the methodological framing of the solution, and the evaluation of the models, including the comparison of metrics, the analysis of the CIA FOIA Reading Room entity data structure, and the resulting information-ethical questions.

The University of Texas at Austin provided expert context in geopolitical and historical studies, which enabled grounding the case study and interpreting the results within the history of the Cold War.

UC Berkeley contributed a perspective from information science and librarianship, including work with digital collections and archival processing practice, which strengthened the applicability of the workflow for digital archives, libraries, and research organizations focused on history.

This cross-disciplinary cooperation reflects a growing reality: solving “big archive” challenges requires not only technical innovation, but also domain expertise and information science know-how.

From 2,122 Pages to Usable Knowledge: What the System Achieved

The research introduces an eight-agent workflow designed to mirror the real tasks historians and intelligence researchers face when working with archival material. The system was applied to 201 President’s Daily Brief documents, spanning January 1968 to January 1969, totaling 2,122 pages from the CIA’s FOIA Electronic Reading Room.

The AI pipeline produced three key outputs:

  1. A month-by-month narrative summary of intelligence reporting on Czechoslovakia
  2. A structured list of key named entities (people, organizations, events) organized chronologically
  3. A thematic quantification of reporting, measuring how much attention was given to political, societal, economic, and tactical military topics

To reduce noise and improve relevance, the system used OCR (optical character recognition) and automated filtering. Out of more than 1.37 million characters extracted via OCR, the pipeline isolated 265,550 characters of relevant intelligence content, achieving an extraction rate of 19.3%—meaning over 80% of raw text was correctly removed as irrelevant metadata or unrelated content.

Why This Matters for Society

This research tackles a quiet but serious societal problem: massive collections of historically valuable documents exist but remain effectively “locked away” because they are not machine-readable or searchable in meaningful ways.

Many declassified archives—especially scanned collections—are technically accessible but practically unusable without months (or years) of manual work. By introducing a replicable agentic workflow, the study shows how AI can:

  • expand access to historical primary sources
  • reduce routine work (searching, cleaning, extracting, organizing)
  • support transparency and democratic access to government records
  • enable deeper analysis of geopolitical crises through time-resolved narratives

The research is grounded in the democratic logic behind the US Freedom of Information Act (FOIA): that an informed public is essential for a functioning democracy. In this context, AI becomes more than a productivity tool—it becomes a method for scaling public understanding of complex historical events.

A Key Message: AI Helps, But Experts Still Matter

A central conclusion is clear and responsible: fully automated historical analysis is not yet feasible without risk. OCR errors, model instability, and interpretive ambiguity remain real challenges.

The authors emphasize the need for human-in-the-loop workflows, where AI accelerates extraction and structuring, while experts validate, interpret, and preserve historical nuance.

In other words: AI can carry the heavy boxes—but humans still need to read the labels.

Main Takeaways

This research offers a practical and forward-looking message for archives, universities, and society:

  • Agentic AI can turn unstructured archives into structured knowledge
  • Large language models can support digital humanities at scale
  • Model selection must be based on measurable trade-offs (quality, cost, speed, stability)
  • Human oversight remains essential for credibility
  • The approach is replicable beyond Cold War history, and can be extended to other FOIA collections and geopolitical contexts

About the Publication

The study was published in The Electronic Library under the title:
“A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection.”

Reference: Černý J, Avramov K, Pendse LR (2026;), “A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection”. The Electronic Library, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/EL-06-2025-0272


Call for Papers for the Central Asian Studies Conference

Central Asian Studies Society, University of Chicago                                           
6031 S Ellis Ave, Chicago, IL 60637, US   cass.uofc@gmail.com

Call for Papers for the Central Asian Studies Conference
 
We are excited to announce the Central Asian Studies Conference at the University of Chicago, organized by the university’s Central Asian Studies Society and taking place on April 17–18, 2026.
About the Conference. Throughout Central Asia, embodied culture is expressed through art and culture: oral traditions, written poetry and literature, textiles, music, and many other media. Creative acts and works have been intertwined with collective experiences ranging from celebrations to invasions to revolutions, working to represent and shape memory and identity. Our conference centers reflections on art, music, oral traditions, literature and other cultural practices as not only objects of study, but also as sources of inspiration, tangible connections to the past and means to understand the present. We are creating a space for young researchers interested in matters of culture and identity to meet, learn about, and learn from each other.
Call for Papers. We are now accepting abstracts of papers, mainly from graduate students, but also from postdoctoral fellows, faculty members, and independent scholars. We invite historians, linguists, anthropologists, art historians, literary scholars, sociologists, musicologists, and scholars of religion whose work engages with Central Asia—conceived broadly: from the Mongolian Plateau in the east to the Urals in the west, from Afghanistan in the south to the Altai Mountains in the north—between late antiquity to the present.
We particularly encourage submissions related to this inaugural conference’s theme: “Voices through Art and Culture: Identity Formation in Central Asia, from Music to Architecture.” What can art and culture tell us about the process of identity formation? What is the relationship between culture and politics? How were the responses to historical events that affected the whole of Central Asia, in political, ecological, economical realms differ and take shape in the forms of art and culture? How does art and culture reflect Central Asianness, whether as a unified identity and/or a condition of great diversity and difference?
In the current political climate of instability globally and in the region, this Conference aims to delve into the historical practice of artistic and cultural responses and help us investigate the current time – how is identity being transformed and reflected in modern art and cultural traditions? We believe that, especially in at such a time, it is important to look back at the roots of the identity and reevaluate it. And there is no better tool for that than looking into Art and Culture.
Keynote Speakers: The keynote speakers for the Conference are a distinguished scholar of ethnomusicology Professor Theodore C. Levin and a prominent artist from Kazakhstan Gulnur Mukazhanova.
Dr. Theodore C. Levin, Arthur R. Virgin Professor of Music at Dartmouth University, author of the book The Music of Central Asia. Theodore Levin is a longtime student of music, expressive culture, and traditional spirituality in Central Asia and Siberia. Levin served as the first executive director of the Silk Road Project, founded by cellist Yo-Yo Ma. His research and advocacy activities focus on the role of arts and culture in international development, and on the preservation and revitalization of musical heritage.
Gulnur Mukazhanova, a distinguished artist born in Kazakhstan and based in Berlin, who weaves together Central Asian heritage with contemporary artistic enquiry. Through textiles and symbolic materials, she evokes layers of cultural and historical memory. Her works unfold as dialogues between suppressed traditions and today’s shifting realities, reflecting on postcolonial experience, feminism and globalization. Her recent solo exhibitions include Bosağa – Transition. The Weave of Ancestral Memory at the Tselinny Center of Contemporary Culture, Almaty (2025); Öliara: The Dark Moon at Mimosa House, London (2022); and The Space of Silence at Aspan Gallery, Almaty (2021).
Submissions. Please send submissions electronically to caconferenceuofc@gmail.com no later than Sunday, February 1, 2026. Please include your name, institutional affiliation, program of study or position, a 250-word abstract, and a tentative title. If you are unsure about the suitability of your topic, please feel free to email us at the above address. Applicants will hear back from us by late February 2026.
Selected papers will be grouped into panels of three. Participants should be prepared to deliver a 20-minute presentation, followed by a led Q&A discussion. Written papers must be circulated to the discussant and fellow members of the panel at least two weeks before the conference.
Limited funds for travel will be available to presenters without access to institutional funding. Please indicate if you are interested in being considered for this funding in your email.
Please circulate this widely! For questions and accessibility concerns, please write to caconferenceuofc@gmail.com.
A performance by the Tuvan music trio Alash, also organized by the Central Asian Studies Society and taking place in Rockefeller Chapel, will conclude the conference.

Trial Access to the Africa Commons Digital Archival Collections

Trial access to the Africa Commons digital archival collections, produced by Coherent Digital, is available until January 31st. This resource provides access to books, magazines, newspapers, government documents, manuscripts, photographs, videos, and oral histories related to African history and culture. Africa Commons is a project which aims to enable Africa to easily control, digitize, and disseminate its cultural heritage–within Africa, and internationally.

Africa Commons comprises four distinct collections:

History and Culture, an index of open source materials related to African history and culture.

Black South African Magazines created from 1937-1973 targeting Black audiences.

Southern African Films and Documentaries including propaganda, newsreels, documentaries, feature films, and interviews spanning the 1900s to the early 2000s.

The Hilary Ng’wengo Archive documents the fifty-year career of the iconic Kenyan journalist, publisher, commentator, and public figure Hilary Ng’wengo through his magazines, newspapers, television programs, and documentaries.

Send your feedback to mmckenzie@berkeley.edu.