PhiloBiblon 2023 n. 3 (May): NEH support for PhiloBiblon and the Wikiworld

Metropolitan Museum X.430.1, f. 1r
Metropolitan Museum X.430.1, f. 1r

We are delighted to announce that PhiloBiblon has received a two-year implementation grant from the Humanities Collections and Reference Resources program of the National Endowment for the Humanities to complete the mapping of PhiloBiblon from its almost forty-year-old relational database technology to the Wikibase technology that underlies Wikipedia and Wikidata. The project will start on the first of July and, Dios mediante, will finish successfully by the end of June 2025.

The fundamental problem is to map the 422,000+ records of PhiloBiblon’s bibliographies with their complexly interrelated relational tables to the triplestore structure of Wikibase.  A triplestore relates two Items by means of a Property. Thus a Work is linked to an Author by the Property “written by.”

We received an NEH Foundations grant for this project in 2021, as described in detail in PhiloBiblon 2021 (n. 3): PhiloBiblon y el mundo wiki: propuesta de una colaboración. Over the course of the last two years, the pilot project team, consisting of Charles Faulhaber (PI), Patricia García Sánchez-Migallón and Almudena Izquierdo Andreu (doctores por la UCM); Berkeley undergraduate Spanish and data science majors (Julieta Soto, Serena Bai, Tina Lin, Cassandra Calciano, Martín García Ángel); Max Ziff (data engineer); and Josep Formentí (user interface programmer), has analyzed the data structures of PhiloBiblon’s ten relational tables (using BETA for the test cases) and worked out the procedures needed to convert them into triplestore structures.

Almudena and Patricia manually mapped more than 125 BETA records to FactGrid: PhiloBiblon as models for the automated processing of the rest. See for example the records for Alfonso X, BNE MSS/10069 (Cantigas de Santa Maria), and the 1497 edition of the translation of Boccacio’s Fiammeta. These models have been key for establishing the semantic relations between PhiloBiblon’s data fields and the Properties and Items in FactGrid. In many cases appropriate properties did not exist and it was necessary to create them. For example, something as simple as the Watermark property was needed in order to identify the various watermark types set forth in PhiloBiblon’s controlled vocabulary.

Julieta Soto and Martín García Ángel attacked the problem of creating almost 900 FactGrid records for the controlled vocabulary terms in BETA. This meant in the first place a search in FactGrid to make sure that an equivalent term did not already exist, in order to avoid creating duplicate records. Then they had to situate the term in the FactGrid ontology by specifying it as a “basic object” (e.g., fruit) or identifying it as a subclass of an appropriate basic object, for example facsímil impreso as a subclass of facsímil. At the same time they had to link the record to the code in PhiloBiblon, BIBLIOGRAPHY*RELATED_BIBCLASS*FAP, identifying a record in the Bibliography table as a print facsimile, thereby making it possible to search for such items.

The default viewer used in FactGrid, the same as that used in Wikidata, is not user friendly. Therefore Josep has created a prototype user interface, using data from the BETA Institutions table. We encourage you to play with it and tell us what you like or—more usefully—don’t like.

This change to Wikibase technology is designed to allow PhiloBiblon not only to take advantage of the linked open data of the semantic web, but also, and most importantly, to decrease sustainability costs. Because Wikibase is open-source software maintained by WikiMedia Deutschland, the software development arm of the Wikimedia Foundation, software maintenance costs for PhiloBiblon will be minimal in the future. This means that it will no longer be necessary to seek major grant support every five to seven years merely to keep up with technology change.

While this work has been going on, we have not neglected the vital process of cleaning up PhiloBiblon data in order to facilitate the automated mapping nor the equally vital process of adding new information to PhiloBiblon. For example,  Pedro Pinto, a member of the BITAGAP team, has recently discovered a “folha desmembrada” (BITAGAP manid 7862) from the Livro 4 of the chancery records of king Fernando I (1345-1383) (BITAGAP manid 3255), separated from the manuscript in the Arquivo Nacional da Torre do Tombo. The newly discovered dismembered leaf contains five previously unknown royal documents. It was being used as the cover of the “Livro de Acordãos, 1620-24,” in the archive of the Santa Casa de Misericórdia in Coruche, a small city in the Santarem district on the Tagus river northeast of Lisbon.

The recycling of  parchment leaves from discarded medieval manuscripts, presumably for more socially beneficial purposes, such as the protection of administrative records, was common in both Spain and Portugal in the sixteenth and seventeenh centuries. Such leaves have been the source of many unknown or poorly documented medieval texts. Perhaps the most spectacular example was Harvey Sharrer’s discovery in 1990 of the eponymous Pergaminho Sharrer (BITAGAP manid 1817), with musical notation for seven poems of king Dinis of Portugal (1279-1325). This had been used as the binding of a collection of notarial documents (Lisboa: Arquivo Nacional da Torre do Tombo: Lisboa, Cartório Notarial de. N. 7-A, Caixa 1, Maça 1, livro 3).

Mariña Arbor Aldea
Arthur L-F. Askins
Vicenç Beltran Pepió
Álvaro Bustos Táuler
Antonio Cortijo Ocaña
Charles B. Faulhaber
Patricia García Sánchez-Migallón
Ángel Gómez Moreno
José Luis Gonzalo Sánchez-Molero
Almudena Izquierdo Andreu
Filipe Alves Moreira
María Morrás
Óscar Perea Rodríguez
Ricardo Pichel Gotérrez
Pedro Pinto
Maria de Lurdes Rosa
Nicasio Salvador Miguel
Martha E. Schaffer
Harvey L. Sharrer
Cristina Sobral
Lourdes Soriano Robles