International Collaboration (VŠE Prague, UT Austin, UC Berkeley) Builds Agentic AI System for CIA FOIA Archives

International Collaboration (VŠE Prague, UT Austin, UC Berkeley) Builds Agentic AI System for CIA FOIA Archives

Prague / Austin / Berkeley — A new international research collaboration has developed and tested a multi-stage “agentic AI” system capable of extracting structured historical knowledge from large, unstructured digital archives. Using declassified CIA documents as a case study, the research demonstrates how artificial intelligence can help transform thousands of pages of scanned archival material into a coherent, time-resolved narrative, making Cold War-era intelligence reporting significantly more accessible for the wider public.

The study, published in The Electronic Library, focuses on one of the most dramatic turning points in modern European history: the Prague Spring reforms and the subsequent Soviet-led invasion of Czechoslovakia in 1968. By applying AI-driven processing to the CIA’s FOIA Electronic Reading Room, the team shows how today’s large language models (LLMs) can support the systematic reconstruction of historical reporting, while also highlighting the continued need for expert human oversight to preserve nuance, accuracy, and interpretive integrity.

This makes the research immediately useful not only for historians, but also for institutions deciding how to deploy AI responsibly at scale: libraries, archives, universities, and even public sector organizations managing large document collections.

A Collaboration Across Three Institutions and Disciplines

The project brings together expertise from three leading academic environments:

Prague University of Economics and Business contributed primarily to the design of the agentic workflow, the methodological framing of the solution, and the evaluation of the models, including the comparison of metrics, the analysis of the CIA FOIA Reading Room entity data structure, and the resulting information-ethical questions.

The University of Texas at Austin provided expert context in geopolitical and historical studies, which enabled grounding the case study and interpreting the results within the history of the Cold War.

UC Berkeley contributed a perspective from information science and librarianship, including work with digital collections and archival processing practice, which strengthened the applicability of the workflow for digital archives, libraries, and research organizations focused on history.

This cross-disciplinary cooperation reflects a growing reality: solving “big archive” challenges requires not only technical innovation, but also domain expertise and information science know-how.

From 2,122 Pages to Usable Knowledge: What the System Achieved

The research introduces an eight-agent workflow designed to mirror the real tasks historians and intelligence researchers face when working with archival material. The system was applied to 201 President’s Daily Brief documents, spanning January 1968 to January 1969, totaling 2,122 pages from the CIA’s FOIA Electronic Reading Room.

The AI pipeline produced three key outputs:

  1. A month-by-month narrative summary of intelligence reporting on Czechoslovakia
  2. A structured list of key named entities (people, organizations, events) organized chronologically
  3. A thematic quantification of reporting, measuring how much attention was given to political, societal, economic, and tactical military topics

To reduce noise and improve relevance, the system used OCR (optical character recognition) and automated filtering. Out of more than 1.37 million characters extracted via OCR, the pipeline isolated 265,550 characters of relevant intelligence content, achieving an extraction rate of 19.3%—meaning over 80% of raw text was correctly removed as irrelevant metadata or unrelated content.

Why This Matters for Society

This research tackles a quiet but serious societal problem: massive collections of historically valuable documents exist but remain effectively “locked away” because they are not machine-readable or searchable in meaningful ways.

Many declassified archives—especially scanned collections—are technically accessible but practically unusable without months (or years) of manual work. By introducing a replicable agentic workflow, the study shows how AI can:

  • expand access to historical primary sources
  • reduce routine work (searching, cleaning, extracting, organizing)
  • support transparency and democratic access to government records
  • enable deeper analysis of geopolitical crises through time-resolved narratives

The research is grounded in the democratic logic behind the US Freedom of Information Act (FOIA): that an informed public is essential for a functioning democracy. In this context, AI becomes more than a productivity tool—it becomes a method for scaling public understanding of complex historical events.

A Key Message: AI Helps, But Experts Still Matter

A central conclusion is clear and responsible: fully automated historical analysis is not yet feasible without risk. OCR errors, model instability, and interpretive ambiguity remain real challenges.

The authors emphasize the need for human-in-the-loop workflows, where AI accelerates extraction and structuring, while experts validate, interpret, and preserve historical nuance.

In other words: AI can carry the heavy boxes—but humans still need to read the labels.

Main Takeaways

This research offers a practical and forward-looking message for archives, universities, and society:

  • Agentic AI can turn unstructured archives into structured knowledge
  • Large language models can support digital humanities at scale
  • Model selection must be based on measurable trade-offs (quality, cost, speed, stability)
  • Human oversight remains essential for credibility
  • The approach is replicable beyond Cold War history, and can be extended to other FOIA collections and geopolitical contexts

About the Publication

The study was published in The Electronic Library under the title:
“A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection.”

Reference: Černý J, Avramov K, Pendse LR (2026;), “A multi-stage agentic AI system for extracting information from large digital archives: case study on the Czechoslovak year 1968 in CIA’s FOIA collection”. The Electronic Library, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/EL-06-2025-0272


New Open Access Resource in Eastern European and Slavic Studies: Estonia Digital Archive (1991-2009)

We have access to a fully digitized daily newspaper from Estonia (1991-2009) aimed at Russian-speaking citizens of Estonia.

Following Estonia’s independence in 1991, the Tallinn-based Russian-language broadsheet Estoniia was launched. Built by the staff of the former Sovetskaia Estonia, it stood out as one of the country’s pioneering private media outlets. The paper took inspiration from Western journalism, focusing its reporting on global and local politics, financial trends, and the arts. Under the financial patronage of Vitaly Khaitov, the publication grew significantly and rebranded as Vesti dnia in 2004, though it eventually folded in 2009 due to economic challenges and a competitive market.


Library Trial: Piatidnevka Digital Archive (DA-PIAT) through December 5, 2024

The UC Berkeley Library has started a trial of Piatidnevka Digital Archive The trial will end on December 5, 2024. Please provide your feedback to your Librarian for Slavic, East European and Eurasian Studies at Lpendse at berkeley dot edu

The Piatidnevka Digital Archive is a valuable resource for researchers studying early Soviet history, particularly between 1929 and 1931. Published six times a month, this journal documents the Soviet Union’s brief experiment with a five-day workweek. The archive provides insight into the Soviet goal of replacing traditional societal norms with innovative approaches. It contains a wealth of visual and textual materials, including photographs, articles, editorials, and commentaries that offer firsthand perspectives on this significant period.

The trial can be accessed here.

The Piatidnevka (Пятидневка, “Five Day Week”) Digital Archive stands as an invaluable asset for scholars engaged in the study of early Soviet history. Specifically focusing on the period between 1929 and 1931 and published six times per month, this illustrative journal provides critical insights into the Soviet Union’s brief but notable experiment with a five-day workweek, comprising four workdays followed by a day of rest. This initiative reflects the broader Soviet aim of dismantling traditional societal structures in favor of innovative paradigms. The archive is rich in visual and textual content, offering wonderful artistic photos, articles, editorials, and commentaries that furnish first-hand accounts of this significant phase in Soviet history.
The title page of issue of Piatidnevka for July 1930.

Library Trial: Brill’s British Intelligence on Russia in Central Asia, c. 1865–1949

The UC Berkeley Library has initiated a thirty-day trial of British Intelligence on Russia in Central Asia, c. 1865–1949’s database. The trial ends on November 17, 2024

One may access the trial here: Brill’s British Intelligence on Russia in Central Asia.

Please log in using proxy or VPN if you are accessing the resource from an off-campus location.

The database contains the following primary sources according to the self-description below, ”

Michell’s Russian Abstracts

During the 1870s and 1880s, the India Office Political and Secret Department considered the Russian and Central Asian question so vital that it employed an interpreter, Robert Michell, whose task was to review and translate Russian printed reports and extracts from Russian newspapers and other publications. Newspapers and journals regularly monitored included the Moscow Gazette, Turkestan Gazette, Journal de St Petersbourg, Russian Invalid, St Petersburg Gazette, Golos, Turkestan Gazette, and Novoye Vremia.

This image depicts SectionMichell's Russian Abstracts and Memories, 1872-1883
Year
1879
Institution
London: War Office, Intelligence Division
Dates
Jan 1879-Dec 1880
Physical Description
206 ff
British Library File Number
L/P&S/20/RUS/4
Microform Collection
fiche 34-38 (12-16) | reel 4
Scope and Content
includes: Captain Kuropatkin's itineraries of routes in Kashgaria. From Osh to Kashgar, traversed by the Russian Mission under Colonel Kuropatkin in October 1876; from the city of Kashgar to the city of Aksu, November to December 1876 Bykof's survey of the upper course of the Oxus, from the Turkestan Gazette, May 1879 Turcomania and the Turcomans, by Captain Kuropatkin, from the Russian Military Journal, 1879 Colonel Grodekof's journey from Tashkend through Mazar-i-Sharif, Balkh and Herat to Persia, from the Novoye Vremia, 1880
Section
Michell’s Russian Abstracts and Memories, 1872-1883
Year
1879
Institution
London: War Office, Intelligence Division

Political and Secret Memoranda

At about the same time, as a result of the increasing quantity of intelligence now being regularly received, the India Office Political and Secret Department began to produce printed memoranda in order to provide ministers with easily digestible précis of the information they needed to formulate policy. For officials in India and London, processing information from the frontiers and providing background papers for successive incoming governments and their ministers became an almost full-time occupation. The Memoranda was arranged and numbered by contemporary India Office officials in an alphanumeric sequence that reflected the geographical subject area. Memoranda relating to Central Asia, which included items reflecting the great political debate and guessing game over the nature of Russian intentions in the region, were usually put away in series “C”.

Political and Secret Files on Soviet Central Asia

Although Anglo-Russian rivalry officially ended with the Convention of 1907, Russian ascendancy in Central Asia continued to interest the British imperial administrations. The two powers confronted each other again after the First World War and the Russian Revolution. With the creation of Soviet Socialist Republics in the period between the two World Wars, the British rulers of India were increasingly concerned with infiltrating Indian politics of communist and nationalist agents and ideas. During this period, a new generation of British military and political intelligence officers, spies, and adventurers made courageous, sometimes unofficial, journeys into the Central Asian republics and beyond into Sinkiang. A British Indian agent was stationed at Kashgar in 1893, but 1911 the post was upgraded to Consulate-General. Kashgar became the listening post and source of regular intelligence briefings, political diaries, and trade reports.

Provenance and Archival Background

The archives of the India Office Political and Secret Department (and Military Department) form part of the Oriental and India Office Collections (OIOC) now within the Asia, Pacific, and Africa Collections at the British Library. The Political and Secret Department papers and printed material have now been catalogued under the OIOC reference L/PS. Military Department papers are located under the reference L/MIL.


Local and Independent Ukrainian Newspapers on Global Press Archive Electronically Available

The 1990s and early 2000s marked a turbulent period in Ukraine’s history due to the fall of the Soviet Union and the emergence of an independent Ukraine. Despite gaining free speech and property rights, citizens faced economic hardships. Corruption scandals and the murder of journalist Georgiy Gongadze in 2000 sparked nationwide protests against the political elite. The Local and Independent Ukrainian Newspapers collection covers this era up to the Orange Revolution (2004–2005), offering insights from over 900 newspapers across 340 cities, reflecting regional and ethnic dynamics. The collection includes publications in Ukrainian, Russian, and other languages like Armenian, German, Polish, etc., providing a detailed view of historical events. Access to this database is supported by the Center for Research Libraries and its members.

One can access this collection here.

Here is the landing page of the The 1990s and early 2000s were a tumultuous time in Ukraine’s history. The fall of the Soviet Union and the establishment of independent Ukraine radically altered its political system. Citizens were guaranteed free speech and property rights; however, they suffered under a prolonged economic depression. In 2000, corruption scandals and the murder of investigative journalist Georgiy Gongadze triggered nationwide protests against Ukraine’s political elites.

 


New Digital Resource: Собрание законов и распоряжений правительства РСФСР и СССР= Collection of laws and orders of the government of the RSFSR and the USSR

Recently in light of Russian invasion of Ukraine, with almost everything Russian being canceled in society at large, I wanted to bring to our readers’ attention a new digital resource on the Collection of laws and orders of the government of the RSFSR and the USSR. The resource is in Russian, and it was created by the Elektronnaia biblioteka istoricheskikh dokumentov (Электронная библиотека исторических документов).

The source provides access to digital copies of the laws and various orders of the Russian Soviet Socialist Republic and Soviet Union. I hope historians of the Soviet Union and the Russian Federation will find this resource of academic interest.

One can search within the text using specific keywords.

This picture shows the landing page of the compendium of laws of the Soviet Union for 1918.

Center for Research Libraries releases Soviet-Era Ukrainian Newspapers Online

Center for Research Libraries in collaboration with the Global Press Archive of East View has released its latest digital collection of select Soviet-Era Ukrainian Newspaper. The collection can be accessed here: https://gpa.eastview.com/crl/seun/ or here

This image describes the landing page of the Center for Research Libraries Global Press Archives Soviet Era Ukrainian Digitized Newspapers page.
Soviet Era Ukrainian Newspaper project’s landing page. These are digital copies.

About the collection:

The early 20th century was a crucial time in Ukraine’s history, marked by attempts to establish an independent state, leading to the Ukrainian War of Independence. This conflict resulted in the creation of two countries by 1922: the Second Polish Republic in western Ukraine and the Ukrainian Soviet Socialist Republic in the rest of the country.

Following this, rapid Soviet collectivization in the Ukrainian SSR triggered the Holodomor, a famine that began in 1932 and claimed millions of lives.

The Soviet-Era Ukrainian Newspapers (SEUN) collection, with over 50,000 pages and five titles, documents Ukraine’s history during this turbulent period, including events leading up to WWII. It includes newspapers from Kyiv, Kharkiv, and Lviv, featuring content in both Ukrainian and Russian.


Sovetskii Ekran (Soviet Film) Digital Archive at UC Berkeley Library

The Library has purchased the Digital Archive of a Soviet film magazine: Sovetskii Ekran. The archive provides access to the full-text of journal issues that were published from 1925-1998.

Below is the screenshot of the landing page of the Sovietskii Ekran.

The landing page of the digital archive of Sovetskii Ekran aka Soviet Film Journal. The archive spans from 1925-1998.
The landing page of Sovietskii Ekran Digital Archive (above)

Access it here

At the time of writing this blog, the digitization of issues was completed through 1970 and the additional digitization was in progress.

a page from digitized issue no. 1 of 1970 of Sovetskii Ekran.
Sovetskii Ekran, Issue no. 1 (1970)

About the journal:

Soviet Screen was a magazine in the USSR that ran from 1925 to 1998 (with a break from 1941 to 1957). It talked about movies, both from the Soviet Union and other countries, cinema history, and had articles critiquing films. They also had reader polls each year to pick the best film, actor, actress, film for children, and music film.

The magazine had different names over the years, like Screen Film Gazeta in 1925, Cinema and Life in 1929–1930, Proletarian Cinema from 1931–1939, and Screen from 1991–1997. Before 1992, it was connected to the Union of Cinematographers of the USSR State Committee for Cinematography and the USSR.

In 1984, they printed 1.9 million copies. In 1991, the editor was Victor Dyomin, and the magazine was published under the title: Screen. It started coming out less often, monthly instead of more frequently. It kept going as Screen Magazine until 1997, then for a few months in 1997-1998, it went back to its old name-Soviet Screen. But it couldn’t survive the financial troubles in 1998 and had to stop publishing (Source: Wikip.).


Library Trial: Znamia Digital Archive (Soviet-era periodical)

At the library, we have set up a thirty-day trial of Znamia Digital Archive through November 18, 2023.

The extensive archive of Znamia (Знамя, Banner), a highly regarded Soviet/Russian “thick journal” (tolstyi zhurnal), covers more than nine decades and is a rich source of intellectual and artistic contributions. This monthly publication has been a vibrant platform for literature, critical analysis, philosophy, and, at times, political commentary.

Originally introduced in January 1931 as LOKAF (Локаф), an acronym for the Literary Association of the Red Army and Navy, the journal officially adopted the name Znamia, which translates to “Banner” in English, in 1933. Throughout its history, Znamia has played a crucial role in presenting the works of renowned authors such as Anna Akhmatova, Alexander Tvardovsky, Yevgeny Yevtushenko, Konstantin Paustovsky, Yuri Kazakov, and Yuri Trifonov.

During the era of Perestroika, starting in 1986, Znamia underwent a significant transformation and became one of Russia’s most widely read literary journals, serving as a herald of the Perestroika movement.

The comprehensive archive of Znamia, an esteemed Soviet/Russian "thick journal," spans over nine decades and serves as a treasure trove of intellectual and artistic contributions. This monthly publication has been a vibrant platform for literature, critical analysis, philosophy, and at times, political commentary. Originally launched in January 1931 under the name LOKAF, an acronym for the Literary Association of the Red Army and Navy, the journal was officially rebranded as Znamia—which translates to 'Banner' in English—in 1933. In 1948, several members of the editorial staff were ousted for their perceived failure to adequately combat "cosmopolitanism." Throughout its history, Znamia has been a pivotal venue for showcasing the works of preeminent authors such as Anna Akhmatova, Alexander Tvardovsky, Yevgeny Yevtushenko, Konstantin Paustovsky, Yuri Kazakov, and Yuri Trifonov. In the era of Perestroika, beginning in 1986, Znamia underwent a significant transformation, evolving into one of Russia's most widely-read literary journals and serving as a herald of the Perestroika movement.
a photo of the landing page of Znamia Digital Archive

An issue of Znamia for December 1947

 

 

Access Link: https://libproxy.berkeley.edu/login?qurl=https%3A%2F%2Fdlib.eastview.com%2Fbrowse%2Fudb%2F6250


Primary sources: Russian language historical ebook collections

This post highlights some of the Library’s acquisitions of Russian-language historical ebook collections that may have escaped your notice.

Anti-religious alphabet bookSoviet Anti-Religious Propaganda ebook collection

East View has digitized a collection of 280 e-books that are most emblematic of Soviet anti-religious fervor. They were published mainly in the 1920s and 1930s on a variety of atheist or anti-religious topics, with titles including Christianity versus Communism, Church versus Democracy, and The Trial of God.

Early Soviet Cinema

Another collection from East View of 116 ebooks, originally published from 1928 to 1948, relating to the golden age of Soviet Cinema.

Russian Avant-garde Online

An ebook collection of 778 works from Brill Online. It represents works of all Russian literary avant-garde schools, most published betwen 1910-1940. According to the publisher, “the strength of this collection is in its sheer range. It contains many rare and intriguingly obscure books, as well as well-known and critically acclaimed texts, almanacs, periodicals, literary manifests. This makes it a gold mine for art historians and literary scholars alike. Represented in it are more than 30 literary groups without which the history of twentieth-century Russian literature would have been very different. Among the groups included are the Ego-Futurists and Cubo-Futurists, the Imaginists, the Constructivists, the Biocosmists, and the infamous nichevoki – who, in their most radical manifestoes, professed complete abstinence from literary creation.”