International Collaboration (VŠE Prague, UT Austin, UC Berkeley) Builds Agentic AI System for CIA FOIA Archives
Prague / Austin / Berkeley — A new international research collaboration has developed and tested a multi-stage “agentic AI” system capable of extracting structured historical knowledge from large, unstructured digital archives. Using declassified CIA documents as a case study, the research demonstrates how artificial intelligence can help transform thousands of pages of scanned archival material into a coherent, time-resolved narrative, making Cold War-era intelligence reporting significantly more accessible for the wider public.
The study, published in The Electronic Library, focuses on one of the most dramatic turning points in modern European history: the Prague Spring reforms and the subsequent Soviet-led invasion of Czechoslovakia in 1968. By applying AI-driven processing to the CIA’s FOIA Electronic Reading Room, the team shows how today’s large language models (LLMs) can support the systematic reconstruction of historical reporting, while also highlighting the continued need for expert human oversight to preserve nuance, accuracy, and interpretive integrity.
This makes the research immediately useful not only for historians, but also for institutions deciding how to deploy AI responsibly at scale: libraries, archives, universities, and even public sector organizations managing large document collections.
A Collaboration Across Three Institutions and Disciplines
The project brings together expertise from three leading academic environments:
Prague University of Economics and Business contributed primarily to the design of the agentic workflow, the methodological framing of the solution, and the evaluation of the models, including the comparison of metrics, the analysis of the CIA FOIA Reading Room entity data structure, and the resulting information-ethical questions.
The University of Texas at Austin provided expert context in geopolitical and historical studies, which enabled grounding the case study and interpreting the results within the history of the Cold War.
UC Berkeley contributed a perspective from information science and librarianship, including work with digital collections and archival processing practice, which strengthened the applicability of the workflow for digital archives, libraries, and research organizations focused on history.
This cross-disciplinary cooperation reflects a growing reality: solving “big archive” challenges requires not only technical innovation, but also domain expertise and information science know-how.
From 2,122 Pages to Usable Knowledge: What the System Achieved
The research introduces an eight-agent workflow designed to mirror the real tasks historians and intelligence researchers face when working with archival material. The system was applied to 201 President’s Daily Brief documents, spanning January 1968 to January 1969, totaling 2,122 pages from the CIA’s FOIA Electronic Reading Room.
The AI pipeline produced three key outputs:
- A month-by-month narrative summary of intelligence reporting on Czechoslovakia
- A structured list of key named entities (people, organizations, events) organized chronologically
- A thematic quantification of reporting, measuring how much attention was given to political, societal, economic, and tactical military topics
To reduce noise and improve relevance, the system used OCR (optical character recognition) and automated filtering. Out of more than 1.37 million characters extracted via OCR, the pipeline isolated 265,550 characters of relevant intelligence content, achieving an extraction rate of 19.3%—meaning over 80% of raw text was correctly removed as irrelevant metadata or unrelated content.
Why This Matters for Society
This research tackles a quiet but serious societal problem: massive collections of historically valuable documents exist but remain effectively “locked away” because they are not machine-readable or searchable in meaningful ways.
Many declassified archives—especially scanned collections—are technically accessible but practically unusable without months (or years) of manual work. By introducing a replicable agentic workflow, the study shows how AI can:
- expand access to historical primary sources
- reduce routine work (searching, cleaning, extracting, organizing)
- support transparency and democratic access to government records
- enable deeper analysis of geopolitical crises through time-resolved narratives
The research is grounded in the democratic logic behind the US Freedom of Information Act (FOIA): that an informed public is essential for a functioning democracy. In this context, AI becomes more than a productivity tool—it becomes a method for scaling public understanding of complex historical events.
A Key Message: AI Helps, But Experts Still Matter
A central conclusion is clear and responsible: fully automated historical analysis is not yet feasible without risk. OCR errors, model instability, and interpretive ambiguity remain real challenges.
The authors emphasize the need for human-in-the-loop workflows, where AI accelerates extraction and structuring, while experts validate, interpret, and preserve historical nuance.
In other words: AI can carry the heavy boxes—but humans still need to read the labels.
Main Takeaways
This research offers a practical and forward-looking message for archives, universities, and society:
- Agentic AI can turn unstructured archives into structured knowledge
- Large language models can support digital humanities at scale
- Model selection must be based on measurable trade-offs (quality, cost, speed, stability)
- Human oversight remains essential for credibility
- The approach is replicable beyond Cold War history, and can be extended to other FOIA collections and geopolitical contexts
About the Publication
The study was published in The Electronic Library under the title:
“A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection.”
Reference: Černý J, Avramov K, Pendse LR (2026;), “A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection”. The Electronic Library, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/EL-06-2025-0272











