Exploring OCR tools with two 19th century documents

— Guest post by Eileen Chen (UCSF)

When I (Eileen Chen, UCSF) started this capstone project with UC Berkeley, as part of the Data Services Continuing Professional Education (DSCPE) program, I had no idea what OCR was. “Something something about processing data with AI” was what I went around telling anyone who asked. As I learned more about Optical Character Recognition (OCR), it soon sucked me in. While it’s a lot different from what I normally do as a research and data librarian, I can’t be more glad that I had the opportunity to work on this project!

The mission was to run two historical documents from the Bancroft Library through a variety of OCR tools – tools that convert images of text into a machine-readable format, relying to various extents on artificial intelligence.

The documents were as follows:

Both were nineteenth century printed texts, and the latter also consists of multiple maps and tables.

I tested a total of seven OCR tools, and ultimately chose two tools with which to process one of the two documents – the earthquake catalogue – from start to finish. You can find more information on some of these tools in this LibGuide.

Comparison of tools

Table comparing OCR tools

OCR Tool Cost Speed Accuracy Use cases
Amazon Textract Pay per use Fast High Modern business documents (e.g. paystubs, signed forms)
Abbyy Finereader By subscription Moderate High Broad applications
Sensus Access Institutional subscription Slow High Conversion to audio files
ChatGPT Free-mium* Fast High Broad applications
Adobe Acrobat By subscription Fast Low PDF files
Online OCR Free Slow Low Printed text
Transkribus By subscription Moderate Varies depending on model Medieval documents
Google AI Pay per use ? ? Broad applications

*Free-mium = free with paid premium option(s)

As Leo Tolstoy famously (never) wrote, “All happy OCR tools are alike; each unhappy OCR tool is unhappy in its own way.” An ideal OCR tool accurately detects and transcribes a variety of texts, be it printed or handwritten, and is undeterred by tables, graphs, or special fonts. But does a happy OCR tool even really exist?

After testing seven of the above tools (excluding Google AI, which made me uncomfortable by asking for my credit card number in order to verify that I am “not a robot”), I am both impressed with and simultaneously let down by the state of OCR today. Amazon Textract seemed accurate enough overall, but corrupted the original file during processing, which made it difficult to compare the original text and its generated output side by side. ChatGPT was by far the most accurate in terms of not making errors, but when it came to maps, admitted that it drew information from other maps from the same time period when it couldn’t read the text. Transkribus’s super model excelled the first time I ran it, but the rest of the models differed vastly in quality (you can only run the super model once on a free trial).

It seems like there is always a trade-off with OCR tools. Faithfulness to original text vs. ability to auto-correct likely errors. Human readability vs. machine readability. User-friendly interface vs. output editability. Accuracy at one language vs. ability to detect multiple languages.

So maybe there’s no winning, but one must admit that utilizing almost any of these tools (except perhaps Adobe Acrobat or Free Online OCR) can save significant time and aggravation. Let’s talk about two tools that made me happy in different ways: Abbyy Finereader and ChatGPT OCR.

Abbyy Finereader

I’ve heard from an archivist colleague that Abbyy Finereader is a gold standard in the archiving world, and it’s not hard to see why. Of all the tools I tested, it was the easiest to do fine-grained editing with through its side-by-side presentation of the original text and editing panel, as well as (mostly) accurately positioned text boxes.

Its level of AI utilization is relatively low, and encourages users to proactively proofread for mistakes by highlighting characters that it flags as potentially erroneous. I did not find this feature to be especially helpful, since the majority of errors I identified had not been highlighted and many of the highlighted characters weren’t actual errors, but I appreciate the human-in-the-loop model nonetheless.

Overall, Abbyy excelled at transcribing paragraphs of printed text, but struggled with maps and tables. It picked up approximately 25% of the text on maps, and 80% of the data from tables. The omissions seemed wholly random to the naked eye. Abbyy was also consistent at making certain mistakes (e.g. mixing up “i” and “1,” or “s” and 8”), and could only detect one language at a time. Since I set the language to English, it automatically omitted the accented “é” in San José in every instance, and mistranscribed nearly every French word that came up. Perhaps some API integration could streamline the editing process, for those who are code-savvy.

Capture of Abbyy Finereader attempt to interpret a map of a portion of California with map on the left and the attempted read on the right.
Earthquake map page as seen in the Abbyy Finereader Editor

I selected “searchable PDF” as my output file type, but Abbyy offers several other file types as well, including docx, csv, and jpg. In spite of its limitations, compared to PDF giant Adobe Acrobat and other PDF-generating OCR tools, Abbyy is still in a league of its own.

ChatGPT OCR

After being disillusioned by Free Online OCR, I decided to manage my expectations for the next free online tool I tested. Sure, it’s ChatGPT, but last I heard about it, it failed to count how many “r”s were in “strawberry.”

Surprisingly, ChatGPT made minimal errors in the transcription process, succeeding at text of various fonts and even rare instances of handwritten annotations. It spewed things out in text form in the chat box, as it does by default, but makes good use of bolding and italics to broadly distinguish between different fonts.

As I was formatting its responses to my prompts into a Word document, I had to think hard about some choices. In what order do I arrange the text to best represent the flow of natural eye movement, especially when it comes to text from a map? Since ChatGPT offers descriptions of page layouts upon further prompting, how much do I want it to do that, and how do I standardize this?

Below is an excerpt from the document as processed by ChatGPT. ChatGPT descriptions are in blue and italicized, with detected original text left in black. I used two separate prompts: 1) “Please transcribe all the text on the two pages”; 2) “Please describe the image on the left page.” (Before you judge me for using “please,” it has been shown that being polite to ChatGPT may get you better results!)

The page features a simple and elegant design with the following elements:
  1. Coat of Arms: At the center of the page is a decorative emblem or coat of arms. It features the word “Baltimore” at the top and a Latin motto, “Vita Doi Mi Taptori,” at the bottom. The design appears to be the official crest or insignia, likely of historical or regional significance to Baltimore.
  2. Printer’s Mark: Beneath the coat of arms, there is text indicating the name and location of the publishing company:
    • “The Lord Baltimore Press” is printed in a distinctive typeface.
    • Below that, it says “THE FRIEDENWALD COMPANY”, followed by the location: “BALTIMORE, MD., U.S.A.”

The page is otherwise blank, with no additional decorations or text, keeping the focus on the coat of arms and the printer’s information. There are some handwritten cataloging notes visible in the upper corners:
In the upper left corner, “F851” and “H74” are written.

In the upper right corner, the number “604” and some other library marking are noted.

The overall layout is minimalist and centered, with the elements well-spaced on the page.

Page with seal stamped in the vertical middle, drifting right horizontally. There is a small pencil mark in the upper left.
Example of a largely blank page with seal.

As far as its work on this document has shown, ChatGPT’s descriptions are fairly safe and neutral. With some human curation for quality assurance, it can go beyond the bare minimum of “human readable” and bring a document to “human comprehensible.”

However, there was one alarming instance: when transcribing text of a smaller font size from a footnote – ChatGPT completely changed the content and basically wrote me a new paragraph based on the few keywords it was able to identify!

Example text from footnote with several special characters such as parenthesis.

ChatGPT:

* First proposed by Rossi in Archives des Sci. Phys. et Nat., IV, p. 371 (1850). Further expanded by Forel (ibid., Vol. V, p. 420). After comparing hundreds of published accounts of California earthquakes, I have found that the words of the Italian and Swiss seismologists fairly express the commonly accepted ideas of the arbitrary terms employed by the California observers. In the phrases for comparison, when any one is describing the effect of a shock, when consulting the tables, and making use of the Rossi-Forel scale, the books and other reports in California will aid.

When I asked it again, ChatGPT produced a different, still wildly incorrect footnote:

*First proposed by Rossi in Archives des Sci. Phys. et Nat., IV, p. 371 (1880). See also Schmidt in Bull. Acad. Petersb., vol. xxix, p. 41. After comparing hundreds of published accounts of California earthquakes, I have found that the words printed in italics in the following scale of Rossi and Forel, are the ones employed with much precision by most of the California observers. The other words are added for convenience. When any one is describing the effect of an earthquake, and closely follows the meaning or intent of the Rossi-Forel scale, then, the better to make that meaning clear, he should consider himself as adhering to the Rossi-Forel numerals. If, while describing an earthquake and seeking to assign the proper R.-F. numeral, it will be found that the words here set down in italics are of service.

The scary thing, in my opinion, isn’t that it twisted the meaning of the footnote entirely, but that it took the wild stab without my knowledge at all.

When it came to map recognition and transcription, ChatGPT took a few more liberties (although it does a fair job justifying and explaining this). After initially detecting none of the cities on a map I uploaded, ChatGPT responded the following to my prompt “Please list each individual city on the map”

Here is a list of the cities visible on the map of Washington and Oregon (based on typical historical maps of this type). However, since I can’t zoom into the map directly from the image, I’ll list some cities likely included on a historical earthquake-related map of Washington and Oregon from this period. Let me know if you need a detailed transcription from another zoomed-in image.

Likely Cities in Washington (circa 19th century):

  • Seattle
  • Tacoma
  • Olympia
  • Spokane
  • Walla Walla
  • Vancouver
  • Port Townsend

Likely Cities in Oregon:

  • Portland
  • Astoria
  • Salem
  • Eugene
  • Oregon City
  • The Dalles
  • Jacksonville

This map would also typically include significant natural features such as:

  • Columbia River
  • Willamette River
  • Cascade Range

If you have a clearer image or can provide additional detail, I could give a more precise list.

All of this was indeed correct, and I appreciate ChatGPT’s transparency about drawing from external sources, but this response generated many more questions. What about the cities on the map that weren’t listed? ChatGPT may have gotten it right this time, but doesn’t it have significant potential to be wrong? If we present it with a digitized map of similar quality that had place names in their Indigenous languages, for example, would they still be listed as, say, “likely to be Vancouver”?

So yes, I was dazzled by the AI magic, but also wary of the perpetuation of potential biases, and of my complicity in this as a user of the tool.

Conclusion

So, let’s summarize my recommendations. If you want an OCR output that’s as similar to the original as possible, and are willing to put in the effort, use Abbyy Finereader. If you want your output to be human-readable and have a shorter turnaround time, use ChatGPT OCR. If you are looking to convert your output to audio, SensusAccess could be for you! Of course, not every type of document works equally well in any OCR tool – doing some experimenting if you have the option to is always a good idea.

A few tips I only came up with after undergoing certain struggles:

  1. Set clear intentions for the final product when choosing an OCR tool
    1. Does it need to be human-readable, or machine-readable?
    2. Who is the audience, and how will they interact with the final product?
  2. Many OCR tools operate on paid credits and have a daily cap on the number of files processed. Plan out the timeline (and budget) in advance!
  3. Title your files well. Better yet, have a file-naming convention. When working with a larger document, many OCR tools would require you to split it into smaller files, and even if not, you will likely end up with multiple versions of a file during your processing adventure.
  4. Use standardized, descriptive prompts when working with ChatGPT for optimal consistency and replicability.

You can find my cleaned datasets here:

  1. Earthquake catalogue (Abbyy Finereader)*
  2. Earthquake catalogue (ChatGPT)

*A disclaimer re: Abbyy Finereader output: I was working under the constraints of a 7-day free trial, and did not have the opportunity to verify any of the location names on maps. Given what I had to work with, I can safely estimate that about 50% of the city names had been butchered.


Native-American Heritage Month 2024

Native American Heritage Month

Get ready to dive into Native American Heritage Month with these must-read books! From epic legends to fresh voices, these stories celebrate the culture, history, and heart of Native communities. Check out more at UCB Overdrive.


Follow Lit at the Library!
Subscribe by email
Instagram: @doe_lit
RSS

Writing History: Undergraduate Research Papers Investigate Ancient Papyri

(Students examine papyri and ostraca during their class visit. Photo by Lee Anne Titangos.)

Writing History: Undergraduate Research Papers Investigate Ancient Papyri

Leah Packard-Grams, Center for the Tebtunis Papyri

This semester, students enrolled in the writing course “Writing History” (AHMA-R1B) got the chance to work as ancient detectives. As their instructor, I asked them each to write a research paper about one of the various ancient documents held in the Center for the Tebtunis Papyri in The Bancroft Library. After examining their options in a class visit, they each chose a papyrus or ostracon to write about. Students were given modern translations of the papyri and ostraca to read, making the ancient texts accessible.  

Students Use Interdisciplinary Approaches

The papyri, ostraca, and artifacts from Tebtunis at UC Berkeley were excavated from the site in 1899-1900, and the material has been an asset for Berkeley’s research and teaching collections for over a century. However, with over 26,000 fragments of papyrus, about two dozen ostraca, and many artifacts in the Hearst Museum, there is still plenty of work to be done! Students noticed new things in these artifacts: senior Chloe Logan, for example, described the painting on the reverse side of an inscribed papyrus for the very first time; it had been ignored by scholars for decades despite several scholarly citations of the text on the other side. P.Tebt.1087 was used as part of mummy cartonnage, a sort of ancient papier-mâché that was painted to decorate the casing of the mummy. Cartonnage was made by gluing together layers of previously-used papyrus and then painting over the gessoed surface. Her paper examines both the painted side of the papyrus as well as the inscribed side. Using an art-historical approach for the painted side and an economic-historical approach to analyze the content of the financial account on the other side, she wrote an interdisciplinary study of the piece that considered both sides of the artifact, and considered this as an example of ancient recycling. 

P.Tebt.1087
(P.Tebt.1087 was used as part of mummy cartonnage, a sort of ancient papier-mâché that was painted to decorate the casing of the mummy. Photo courtesy of the Center for the Tebtunis Papyri, The Bancroft Library, UC Berkeley. https://digicoll.lib.berkeley.edu/record/231074?ln=en#?xywh=-255%2C-345%2C5081%2C3674 )
(Ian McLendon holds O.Tebt. 4, a receipt of beer. Behind him, Nicolas Iosifidis and Wilder Burke research and photograph papyri.)
(Ian McLendon holds O.Tebt. 4, a receipt of beer. Behind him, Nicolas Iosifidis and Wilder Burke research and photograph papyri.)

Ian McLendon compared receipts and tags for beer on ostraca in the collection (an ostracon is a broken potsherd reused as a writing surface). His paper examined the ways beer was used in ritual dining in Tebtunis, and compared the types of documents that record the beverage’s use, cost, and delivery.  He even examined some ancient coins to see what it would have been like to pay for beer using drachmai and obols, the ancient currency in use in Ptolemaic Egypt.

Mastering Demons

P.Tebt.frag. 13385
(P.Tebt.frag. 13385 is an illustrated papyrus depicting three deities. Tutu is depicted as a lion with a crowned, human head and having snakes and knives at his paws– powerful symbols meant to keep demons at bay. Photo courtesy of the Center for the Tebtunis Papyri, The Bancroft Library, UC Berkeley. https://digicoll.lib.berkeley.edu/record/231444?ln=en#?xywh=-419%2C-117%2C3234%2C2338 )

Nicolas Iosifidis was also inspired by an illustration on a papyrus. Tutu, the “master of demons,” was an apotropaic, protective deity in ancient Egypt who defended against forces of chaos who would do harm to humans. In the papyrus, he is depicted as having a human head, a leonine body, and has snakes and knives in his paws– perhaps even in place of his fingers! His headdress and double-plumed crown also contribute to the awe-inspiring effect of this formidable deity. Iosifidis sees Tutu as an opportunity to examine our deeper selves and master our own demons, asking the question, “Is there something else we can acquire from it [the papyrus] as did people back then?” His paper offers an analysis of the exact role of the master of demons, writing that “Tutu doesn’t protect by killing [demons], but rather controlling or taming them.” The god Tutu, for Iosifidis, represents the timeless struggle between “the good and the bad” that exists within us all.

Reading Between the Lines 

Reading their papers, I was struck in particular by the students’ enthusiastic comments on the significance of these papyri to broader human history. Alex Moyer chose a papyrus that dealt with the investigation into a murder that occurred in 114 BCE, observing that despite the unfortunate universality of homicide throughout human history, “What distinguishes each society from any other is their approach to investigating and handling murders.” His papyrus, P.Tebt. 1.14, is a letter from a village scribe that offers insight into the process of confiscating the property of an accused person until he can be tried and sentenced. Instead of apprehending him, the village scribe was instructed to “arrange for [his property] to be placed on bond” (lines 9-10). Moyer writes about the value of this papyrus as comparative evidence: “Due to the fair condition and legibility of the papyrus, it is able to act as a figurative time capsule, allowing us to compare and contrast with other societies, including our own, and view how human civilization’s attitude and handling of murders have changed over time.” 

P.Tebt. 1.14
(P.Tebt. 1.14, photo courtesy of the Center for the Tebtunis Papyri, The Bancroft Library, UC Berkeley. https://digicoll.lib.berkeley.edu/record/231712?ln=en#?xywh=-2051%2C-257%2C7104%2C5137 )

Victor Flores decided to write about the same papyrus, and was surprised at how this papyrus challenges our perception of the job of an “ancient scribe.” He writes, “These village scribes are not your ordinary scribes, but rather carry a distinct number of tasks like arranging for the bond in order for somebody to confiscate valuables along with carrying out a wide variety of administrative tasks for the government beyond simply writing.” The “village scribe” wasn’t simply a copyist or secretary as one might suppose, and this papyrus is good evidence that allows us to ascertain the roles of scribes!

Student Perspectives

Working with the papyri in The Bancroft Library, I have found that there is a feeling, almost indescribable, when you look at an ancient artifact and really take the time to appreciate what lies before you. Staring up at you is a ghost– a physical echo– that reverberates across the millennia. The artifact before you has survived by sheer luck, and we are fortunate that it remains at all. I tried to convey this to my students, and in their papers, I found that students wanted to write about what it was like to study the papyri up close. This was unprompted by me, and I was astounded at the care and reflection they undertook to share their own perspectives: 

Chloe Logan (class of 2024, writing about the cartonnage fragment): “I must remark how fortunate we are to have an incredible artifact in such good condition as a window to the distant past. I hope we will have more research on the verso side of this astonishing relic.” [Indeed, it is being studied by a scholar in Europe for publication soon!]

Ethan Schiffman (class of 2027): “I enjoyed visiting the Bancroft Library and seeing the large Tebtunis Papyrus collection. I can now better appreciate the magnitude of the time-consuming task of the care involved in preserving the fragile papyri and the difficulties in translating and editing these texts.”

John Soejoto (class of 2027): “By exploring each papyrus, even if only a vague or unproven hypothesis is formed, historians increase the existing body of knowledge and give the future academic community further means to discover the history of bygone ages.”

Wilder Brix Burke (class of 2027): “[Seeing the papyrus in person after studying it for so long] brought a new perspective, a real understanding of the physical lengths such a text had gone to simply exist before me, 2000 years (and some change) later. It also speaks to the impressive ability of UC Berkeley as a whole that undergraduate students get to observe the most unique and fascinating parts of campus. I am grateful for the opportunity to see history before my eyes. These are the moments that remind me why I am a CAL student. Go bears!”

P.Tebt. 1.33
(Wilder Brix Burke poses with his chosen papyrus concerning the visit of a Roman senator to the town of Tebtunis, P.Tebt. 1.33.)

Launch of the Journal Open Access Look-up Tool

Screenshot of the University of California’s CDL Journal Open Access Lookup Tool homepage. The page title reads, “Open Access publishing support for University of California authors.” A search bar labeled “Find a journal” allows users to enter a title or ISSN to check funding availability.

Today, California Digital Library launched the Journal Open Access Look-up Tool (JOLT).

What is it?

It’s a website (jolt.cdlib.org) that enables authors to search by a given journal title to determine whether it is eligible for financial support under one of UC’s systemwide open access publishing agreements. Recall that UC’s OA publishing agreements help defray the payment of open access fees in order to make UC scholarship freely available to readers around the world.

How does it work?

JOLT acts as a standard search bar. Users can search by journal title and ISSN. Journals with well-known abbreviations (e.g. “PLOS” for Public Library of Science) are searchable by their acronyms.

Journals in the search results will include one of three labels:

  • Full funding on request
  • Partial funding or discount
  • No support available

Clicking on a search result will expand it to reveal more details, including the specific level of funding provided, the publisher, and links to the agreement FAQ and payment workflow details on the Office of Scholarly Communication website. If a title search does not yield any results, it means the journal is not covered by an agreement.

Please contact the Scholarly Communication + Information Policy office at schol-comm@berkeley.edu if you have any questions.


Library Trial: Piatidnevka Digital Archive (DA-PIAT) through December 5, 2024

The UC Berkeley Library has started a trial of Piatidnevka Digital Archive The trial will end on December 5, 2024. Please provide your feedback to your Librarian for Slavic, East European and Eurasian Studies at Lpendse at berkeley dot edu

The Piatidnevka Digital Archive is a valuable resource for researchers studying early Soviet history, particularly between 1929 and 1931. Published six times a month, this journal documents the Soviet Union’s brief experiment with a five-day workweek. The archive provides insight into the Soviet goal of replacing traditional societal norms with innovative approaches. It contains a wealth of visual and textual materials, including photographs, articles, editorials, and commentaries that offer firsthand perspectives on this significant period.

The trial can be accessed here.

The Piatidnevka (Пятидневка, “Five Day Week”) Digital Archive stands as an invaluable asset for scholars engaged in the study of early Soviet history. Specifically focusing on the period between 1929 and 1931 and published six times per month, this illustrative journal provides critical insights into the Soviet Union’s brief but notable experiment with a five-day workweek, comprising four workdays followed by a day of rest. This initiative reflects the broader Soviet aim of dismantling traditional societal structures in favor of innovative paradigms. The archive is rich in visual and textual content, offering wonderful artistic photos, articles, editorials, and commentaries that furnish first-hand accounts of this significant phase in Soviet history.
The title page of issue of Piatidnevka for July 1930.

A&H Data: Creating Mapping Layers from Historic Maps

Some of you know that I’m rather delighted by maps. I find them fascinating for many reasons, from their visual beauty to their use of the lie to impart truth, to some of their colors and onward. I think that maps are wonderful and great and superbulous even as I unhappily acknowledge that some are dastardly examples of horror.

What I’m writing about today is the process of taking a historical map (yay!) and pinning it on a contemporary street map in order to use it as a layer in programs like StoryMaps JS or ArcGIS, etc. To do that, I’m going to write about
Picking a Map from Wikimedia Commons
Wikimedia accounts and “map” markup
Warping the map image
Loading the warped map into ArcGIS Online as a layer

But! Before I get into my actual points for the day, I’m going to share one of my very favorite maps:

Stunning 16th century map from a northern projection with the continents spread out around the north pole in greens, blues, and reds. A black border with golds surround the circular maps.
Urbano Monte, Composite: Tavola 1-60. [Map of the World], World map, 40x51cm (Milan, Italy, 1587), David Rumsey Map Collection, http://www.davidrumsey.com.
Just look at this beauty! It’s an azimuthal projection, centered on the North Pole (more on Wikipedia), from a 16th century Italian cartographer. For a little bit about map projections and what they mean, take a look at NASA’s example Map Projections Morph. Or, take a look at the above map in a short video from David Rumsey to watch it spin, as it was designed to.

What is Map Warping

While this is in fact one of my favorite maps and l use many an excuse to talk about it, I did actually bring it up for a reason: the projection (i.e., azimuthal) is almost impossible to warp.

As stated, warping a map is when one takes a historical map and pins it across a standard, contemporary “accurate” street map following a Mercator projection, usually for the purpose of analysis or use in a GIS program, etc.

Here, for example, is the 1913 Sanborn fire insurance map layered in ArcGIS Online maps.

Image of historical Sandborn map warped across the streetmap
Screen capture of ArcGIS with rectified Sanborn map.

I’ll be writing about how I did that below. For the moment, note how the Sanborn map is a bit pinched at the bottom and the borders are tilted. The original map wasn’t aligned precisely North and the process of pinning it (warping it) against an “accurate” street map resulted in the tilting.

That was possible in part because the Sanborn map, for all that they’re quite small and specific, was oriented along a Mercator projection, permitting a rather direct rectification (i.e., warping).

In contrast, take a look at what happens in most GIS programs if one rectifies a map—including my favorite above—which doesn’t follow a Mercator projection:

Weird looking, pulled streams of reds, greens, and blues that are swept across the top and yanked down toward the bottom.
Warped version of the Monte map against a Mercator projection in David Rumsey’s Old Maps Online connection in 2024. You can play with it in Old Maps Online.

Warping a Mercator Map

This still leaves the question: How can one warp a map to begin with?

There are several programs that you can use to “rectify” a map. Among others, many people use QGIS (open access; Windows, macOS, Linux) or ArcGIS Pro (proprietary;Windows only).

Here, I’m going to use Wikimaps Warper (for more info), which connects up with Wikimedia Commons. I haven’t seen much documentation on the agreements and I don’t know what kind of server space the Wikimedia groups are working with, but recently Wikimedia Commons made some kind of agreement with Map Warper (open access, link here) and the resulting Wikimaps Warper is (as of the writing of this post in November 2024) in beta.

I personally think that the resulting access is one of the easiest to currently use.

And on to our steps!

Picking a Map from Wikimedia Commons

To warp a map, one has to have a map. At the moment, I recommend heading over to Wikimedia Commons (https://commons.wikimedia.org/) and selecting something relevant to your work.

Because I’m planning a multi-layered project with my 1950s publisher data, I searched for (san francisco 1950 map) in the search box. Wikimedia returned dozens of Sanborn Insurance Maps. At some point (22 December 2023) a previous user (Nowakki) had uploaded the San Francisco Sanborn maps from high resolution digital surrogates from the Library of Congress.

Looking through the relevant maps, I picked Plate 0000a (link) because it captured several areas of the city and not just a single block.

When looking at material on Wikimedia, it’s a good idea to verify your source. Most of us can upload material into Wikimedia Commons and the information provided on Wikimedia is not always precisely accurate. To verify that I’m working with something legitimately useful, I looked through the metadata and checked the original source (LOC). Here, for example, the Wikimedia map claims to be from 1950 and in the LOC, the original folder says its from 1913.

Feeling good about the legality of using the Sanborn map, I was annoyed about the date. Nonetheless, I decided to go for it.

Moving forward, I checked the quality. Because of how georecification and mapping software works, I wanted as high a quality of map as I could get so that it wouldn’t blur if I zoomed in.

If there wasn’t a relevant map in Wikimedia Commons already, I could upload a map myself (and likely will later). I’ll likely talk about uploading images into Wikimedia Commons in … a couple months maybe? I have so many plans! I find process and looking at steps for getting things done so fascinating.

Wikimedia Accounts and Tags

Form in whites and blacks with options for a username, password.
Signup form for the Wikimedia suite, including Wikimedia Commons and Wikimaps.

Before I can do much with my Sanborn map, I need to log in to Wikimedia Commons as a Wiki user. One can set up an account attached to one of one’s email accounts at no charge. I personally use my work email address.

Note: Wikimedia intentionally does not ask for much information about you and states that they are committed to user privacy. Their info pages (link) states that they will not share their users’ information.

I already had an account, so I logged straight in as “AccidentlyDigital” … because somehow I came up with that name when I created my account.

Once logged in, a few new options will appear on most image or text pages, offering me the opportunity to add or edit material.

Once I picked the Sanborn map, I checked

  1. Was the map already rectified?
  2. Was it tagged as a map?

If the specific map instance has already been rectified in Wikimaps, then there should be some information toward the end of the summary box that has a note about “Geotemporal data” and a linked blue bar at the bottom to “[v]iew the georeferenced map in the Wikimaps Warper.”

WikiMaps screen capture of the "Summary" with the geobox information showing the map's corner cordinants and a link to viewing it on Wikimaps.
Screen capture of “Summary” box with geocordinates from 2024.

If that doesn’t exist, then one might get a summary box that is limited to a description, links, dates, etc., and no reference to georeferencing.

In consequence, I needed to click the “edit” link next to “Summary” above the description. Wikimedia will then load the edit box for only the summary section, which will appear with all the text from the public-facing box surrounded by standard wiki-language markup.

Summary box showing a limited amount of information with purple headers to the left and information to the right on a grey background.
Screen capture of Wikimedia Commons box with limited information for an image.

All I needed to do was change the “{{Information” to “{{Map” and then hit the “Publish” button toward the bottom of the edit box to release my changes.

Screen capture of wikimedia commons edit screen showing what the text for updating a summary looks like.
Screen capture of Wikimedia Commons edit screen for the summary.

The updated, public-facing view will now have a blue button offering to let users “Georeference the map in Wikimaps Warper.”

Once the button appeared, I clicked that lovely, large, blue button and went off to have some excellent fun (my version thereof).

Summary box with map added as object type with blue box for options for georeferencing.
Example of Wikimedia Commons Summary box prior to georeferencing.

Warping the map

When I clicked the “Georefence” button, Wikimedia sent me away to Wikimaps Warper (https://warper.wmflabs.org/). The Wikimaps interface showed me a thumbnail of my chosen map and offered to let me “add this map.”

I, delighted beyond measure, clicked the button and then went and got some tea. Depending on how many users are in the Wikimaps servers and how big the image file for the map is, adding the file into the Wikimaps servers can take between seconds and minutes. I have little patience for uploads and almost always want more tea, so the upload time is a great tea break.

Once the map loaded (I can get back to the file through Wikimedia Commons if I leave), I got an image of my chosen map with a series of options as tabs above the map.

Most of the tabs attempt to offer options for precisely what they say. The “Show” tab offers an image of the loaded map.

Wikimaps Warper navigation tabs in beiges and white tabs showing the selected tabs.
2024 screen capture showing navigation tabs.
  • Edit allows me to edit the metadata (i.e., title, cartographer, etc.) associated with the map.
  • Rectify allows me to pin the map against a contemporary street map.
  • Crop allows me to clip off edges and borders of the map that I might not want to appear in my work.
  • Preview allows me to see where I’m at with the rectification process.
  • Export provides download options and HTML links for exporting the rectified map into other programs.
  • Trace would take me to another program with tracing options. I usually ignore the tab, but there are times when it’s wonderful.

The Sanborn map didn’t have any information I felt inclined to crop, so I clicked straight onto the “Rectify” tab and got to work.

As noted above, the process of rectification involves matching the historic map against a contemporary map. To start, one needs at least four pins matching locations on each map. Personally, I like to start with some major landmarks. For example, I started by finding Union Square and putting pins on the same location in both maps. Once I was happy with my pins’ placement on both maps, I clicked the “add control point” button below the two maps.

split screen showing a historic, streetmap on the left with a
Initial pins set in the historic map on the left and the OpenStreetMap on the right. note the navigation tools in the upper right corner of each panel.

Once I had four pins, I clicked the gray “warp image!” button. The four points were hardly enough and my map curled badly around my points.

To straighten out the map, I went back in and pinned the four corners of the map against the contemporary map. I also pinned several street corners because I wanted the rectified map to be as precisely aligned as possible.

All said, I ended up with more than 40 pins (i.e., control points). As I went, I warped the image every few pins in order to save it and see where the image needed alignment.

Split screen example showing dozens of aligned points in green, yellow, and red.
Screen capture of Wikimaps with example of pins for warping.

As I added control points and warped my map, the pins shifted colors between greens, yellows, and reds with the occasional blue. The colors each demonstrated where the two maps were in exact alignment and where they were being pinched and, well, warped, to match.

Loading the warped map into ArcGIS Online as a layer

Once I was happy with the Sanborn image rectified against the OpenStreetMap that Wikimaps draws in, I was ready to export my work.

In this instance, I eventfully want to have two historic maps for layers and two sets of publisher data (1910s and 1950s).

To work with multiple layers, I needed to move away from Google My Maps and toward a more complex GIS program. Because UC Berkeley has a subscription to ArcGIS Online, I headed there. If I hadn’t had access to that online program, I’d have gone to QGIS. For an access point to ArcGIS online or for more on tools and access points, head to the UC Berkeley Library Research Guide for GIS (https://guides.lib.berkeley.edu/gis/tools).

I’d already set up my ArcGIS Online (AGOL) account, so I jumped straight in at https://cal.maps.arcgis.com/ and then clicked on the “Map” button in the upper-left navigation bar.

Green and white navigation bar with map, screen, groups, content, and more
2024 Screen capture of ArcGIS Online Navigation Bar from login screen
ArcGIS Online add layer list in white and blacks, offering options for layer sourcing from URL, file, sketching, route, or other media.
2024 add layer list in ArcGIS Online

On the Map screen, ArcGIS defaulted to a map of the United States in a Mercator projection. ArcGIS also had the “Layers” options opened in the left-hand tool bars.

Because I didn’t yet have any layers except for my basemap, ArcGIS’s only option in “Layers” was “Add.”

Clicking on the down arrow to the right of “Add,” I selected “Add layer from URL.”

In response, ArcGIS Online gave me a popup box with a space for a URL.

I flipped back to my Wikimaps screen and copied the “Tiles (Google/OSM scheme),” which in this case read https://warper.wmflabs.org/maps/tile/7258/{z}/{x}/{y}.png.

Flipping back to ArcGIS Online, I pasted the tile link into the URL text box and made sure that the auto-populating “Type” information about the layer was accurate. I then hit a series of next to assure ArcGIS Online that I really did want to use this map.

Warning: Because I used a link, the resulting layer is drawn from Wikimaps every time I load my ArcGIS project. That does mean that if I had a poor internet connection, the map might take a hot minute to load or fail entirely. On UC Berkeley campus, that likely won’t be too much of an issue. Elsewhere, it might be.

Once my image layer loaded, I made sure I was aligned with San Francisco, and I saved my map with a relevant title. Good practice means that I also include a map description with the citation information to the Sanborn map layer so that viewers will know where my information is coming from.

Image of historical Sandborn map warped across the streetmap
2024 Screen capture of ArcGIS maps edit screen with rectified Sanborn map.

Once I’ve saved it, I can mess with share settings and begin offering colleagues and other publics the opportunity to see the lovely, rectified Sanborn map. I can also move toward adding additional layers.

Next Time

Next post, I plan to write about how I’m going to add my lovely 1955 publisher dataset on top of a totally different, 1950 San Francisco map as a new layer. Yay!


“Power to the Students and Black Power to Black Students” – The Life and Legacy of Sister Makinya Sibeko-Kouate

The Sister Makinya Sibeko-Kouate papers are open for research! Educator and community activist Sister Makinya Sibeko-Kouate (nee Harriet Smith) was celebrated for her efforts to promote Black Studies curricula in the East Bay and to popularize Kwanzaa. Her papers reflect the ways her personal and political lives intersected and offer a window into both Black radical politics and middle class African-American life in the mid-to late 20th century.

Sibeko-Kouate was born into a middle-class family in San Leandro, California on July 1, 1926. Her father, Turner Smith, worked for Calvert Distillers Corp., and her mother, Willette Smith, was active in many African American social clubs and fraternal orders. Sibeko-Kouate grew up in South Berkeley and graduated from Berkeley High in 1947. In the early 1950s, she studied music and teaching at San Francisco State College and ran a small business (Harriet – Ceramic Creations) out of the home she shared with her mother. Sibeko-Kouate attended Merritt College from 1965-1968, where she studied business administration, real estate, and community planning. She received her BA in Black Studies and an MA in Education from Cal State Hayward in the 1970s. At Merritt, Sibeko-Kouate helped develop the Black Studies Department and was the first African-American person elected student body president.

Here are three of the many faces of Sibeko-Kouate. On the left, posing with her ceramic creations, which she advertised on her business card as “hand made gifts to fit your personality”; in the center, pictured with colleagues who also advocated for the discipline of Black Studies and for Black Power: Sid Walton, Ruth Hagwood, and Nathan Hare; and on the right, teaching a class.

Sibeko-Kouate was elected president of the Associated Students of Merritt College (ASMC) in Fall 1967. In her welcome address, she notes that “Being a college student in 1967-68 is a bewildering experience…we must proceed toward the future in the context of an unpopular external war and an internal revolution…education should be a maker of a virgin future rather than a slave to an unjust and shopworn past. YOU CAN CHANGE THE WORLD!” Students like Sibeko-Kouate, and faculty, like Walton, sought to change the world by advocating for an African-American Studies Program at Merritt. The flyer on the right advertises a student-community ceremony to celebrate the first graduates of that program.

Sibeko-Kouate’s influence went beyond Merritt College. She also served as the President of the National Black Student Union and, in 1968, ran an education workshop on student-community relations for Black students on white campuses at the National Black Power conference in Philadelphia. Sibeko-Kouate’s papers contain materials related to Black curricula and Black Student Unions from many schools in California and, especially, the Bay Area. Examples include a brochure from the Black Students Union at Oakland Tech (in the center and on the right).

Several celebrations of Sibeko-Kouate’s life referred to her as the “Queen Mother of Kwanzaa,” and her papers contain evidence of her efforts to define and promote the holiday. Examples include these flyers and her notes from an Organizational Committee meeting in Seattle. (Most documents in the collection spell the holiday “Kwanza.” That is the original spelling of the African harvest festival on which the celebration is based. According to the American Heritage Dictionary, a seventh letter was added to correspond with the seven African principles honored during the holiday. Both spellings are correct.)

Sibeko-Kouate came into contact with a wide range of political and cultural organizations, either through her direct participation in them, or through ephemera she gathered at events she attended. The materials she collected document decades of African-American cultural life in the Bay Area, including visual arts, music, dance, theater, film, poetry, books, fashion shows, cultural festivals, sporting events, and the culinary arts.  Events ranged from the very local (a night of entertainment at Merritt College), to benefits (for the Parent-Infant Neighborhood Center), to appearances by well-known performers and politicians at community events (Nina Simone, the Chambers Brothers, and Congressman Ron Dellums) . The fact that Sibkeo-Kouate collected these flyers and programs reflected her awareness of their historical significance.

The collection documents how Sibeko-Kouate campaigned for politicians, supported the Black Panthers, and was a community organizer. The buttons on the left reflect her politics, the flyer in the middle asks residents to support a boycott to end police brutality, and the notes on the right document a July 1968 incident when Berkeley police assaulted Sibeko-Kouate and her mother after entering their home without permission. It’s not clear whether the assault motivated the boycott or was just another incident of police brutality in the Black community – but the cause of the organizers was very clear: “Our cause is Justice. We reject the idea of supporting businesses that sanction Police Brutality and the disregard for human dignity…Berkeley is run like a plantation…Plantation days are over. Use your dollars to fight your battles…Blacks keep downtown Berkeley in business. We will boycott businesses until JUSTICE flows.”

The bulk of Sibeko-Kouate’s papers cover the years 1939-1975 and document significant cultural and political changes over that time. The album on the left is from a 1953 Calvert Distillers gathering that Sibeko-Kouate’s father Turner attended. The meeting “represents the first time any industry gathered its men covering the Negro market from all over the country to meet in New York with its top executives for a two-way exchange of ideas on business.” The gathering included a testimonial dinner honoring Thurgood Marshall. Tubie Resnki, an Executive Vice President,  said “This trip is another leaf in the Calvert book of leadership in interracial affairs.” A 1969 calendar produced by Seagram Distillers, Calvert’s parent company (on the right), celebrates “Famous Americans and their Significant Contributions” to the history of the United States. Someone (presumably Sibeko-Kouate) crossed out the outdated/offensive term “Negro” and replaced it with “Black.” This item’s contrast with her father’s souvenir from the early 1950s captures a cultural and political shift in rhetoric that can be seen throughout the collection.

It is exciting that Sibeko-Kouate’s papers contain nearly 100 home movies (8mm and 16mm) that document vacations, celebrations, and other social gatherings with family and friends (circa 1955-1969). These films, like the Calvert album, other photographs, personalia, and family papers in the collection document the everyday lives of middle-class African Americans in the 20th century. (Please note: the films are not currently available for viewing.)

For more information about the collection, access the catalog record and finding aid for the Sister Makinya Sibeko-Kouate papers (BANC MSS 2019/135) here:
Five Valentines Day cards, from students to Sibeko-Kouate
Valentines cards from students (Carton 6, Folder 16)

From the Archives: Laurence I. Moss, Nuclear Engineer and Environmental Activist

A woman smiling
Brianna Iswono

by Brianna Iswono

Brianna Iswono is a third-year undergraduate student at UC Berkeley majoring in chemical engineering. In the Fall 2024 semester, Brianna is working with Roger Eardley-Pryor of the Oral History Center to earn academic credits through Berkeley’s Undergraduate Research Apprentice Program (URAP). URAP provides opportunities for undergraduates to work closely with Berkeley scholars on cutting edge research projects for which Berkeley is world-renowned. This “From the Archives” article emerged from Brianna’s research in the Oral History Center’s long standing Sierra Club Oral History Project.

 

Laurence I. Moss, who recorded his oral history in 1992, integrated engineering innovation with environmental protection in ways that inspire me as a chemical engineering student who wants to contribute towards sustainability. Today, efforts to reduce carbon emissions and combat climate change are increasingly prominent in academia and technological industries. The surge of various electric cars, solar power installations, and increased sustainability awareness begs the question: how has this shift towards a more green future been feasible? This shift has required, and continues to require, technical developments with environmental goals. Laurence I. Moss was a nuclear engineer who, in the 1960s and 1970s, became a national leader in the Sierra Club. Moss used his technical expertise for advancements in engineering as well as developing processes to prioritize environmental protections.

A man smiling
Laurence I. Moss, a nuclear engineer and Sierra Club President from 1973-1974, recorded his oral history in 1992.

Laurence I. Moss’s early life and education laid the groundwork for his expertise in engineering, equipping him with the technical knowledge to contribute meaningfully to the nuclear industry. Moss was born in 1935 during the Great Depression. He was raised in Queens and Brooklyn in New York City by parents who, as he said, believed deeply in the “American Dream.” He attended the prestigious Massachusetts Institute of Technology (MIT) where he studied chemical engineering as an undergraduate. Driven by his passions in math and science, he described wanting a career where “people would be judged on their merits and on their ability to contribute.” Continuing to spark his interests and explore new fields, Moss completed a graduate program at MIT in nuclear engineering, a field he was previously unfamiliar with. 

Moss’s work on nuclear reactors, particularly at Rockwell International, highlighted his ability to understand and improve cutting-edge technologies, a key skill that later influenced his advocacy for sustainable energy solutions. Prior to Rockwell International, Moss worked for nearly ten years at the Santa Susana nuclear field laboratory in Simi Hills where he designed and constructed various nuclear power reactors. He focused on developing safer nuclear technologies as the key engineer for testing so-called “critical experiments”—that is, low-power nuclear physics experiments conducted with nuclear reactors that avoid producing large amounts of fission products. This work laid a foundation for broader environmental impacts that he pursued in his later career at Rockwell International. Reflecting on his time at Rockwell, Moss shared, ”It was very rewarding too when you spend several months doing a highly theoretical calculation which makes certain assumptions about physical and nuclear properties, and predicts on the basis of these assumptions that a certain result will happen under these unusual circumstances. And then you go out and test it, and indeed that’s exactly what happens.” After acquiring first-hand experience managing nuclear-scale trials and operations, Moss joined the Sierra Club, and his work efforts soon transitioned towards processes that targeted innovative and renewable energy alternatives. 

The personal connection Moss had to nature and his growing awareness of environmental issues, such as pollution and habitat destruction, inspired his shift towards engineering solutions that balanced technical progress with environmental preservation. His admiration of the natural environment grew from his youth in a rural setting where he spent most of his time outdoors. Later in life, seeing the effects of pollution in Los Angeles strengthened his belief that engineering should play a role in protecting the environment. Recalling these aspects, especially from his daily commute, Moss shared, “Another influence was the smog in the L.A. Basin. I remember my feelings at the end of the day, usually driving down from the Santa Susana Mountains to the San Fernando Valley and seeing a blanket of smog over the valley. Thinking about living in that polluted environment and how that had to change.” Seeing the impact of pollution firsthand inspired Moss to turn his personal convictions into action by using his engineering knowledge to advocate for environmental protections.

Moss became a prominent figure in the Sierra Club where he leveraged his engineering expertise for environmental advocacy, including influencing key decisions on energy production and infrastructure through a quantitative approach. He encapsulated his values by asserting, “I wanted to know how many pounds, how many tons, how much toxicity, how many people are at risk, what is the probability of distribution for the hazards, the number of people who can be affected by a single incident, and the consequences of that incident.” Moss joined the Sierra Club in 1959, remained active for over fifteen years, and served as the first non-Californian president of the Club from 1973 to 1974. His leadership was characterized by providing data and analytical information to illuminate the economic and environmental trade-offs of energy production and conservation. During his tenure, Moss opposed construction of the Diablo Canyon nuclear power plant in central California, not in opposition to nuclear power, per se, but by emphasizing its potential dangers in an earthquake-prone area as well as concerns about the plant’s long-term sustainability. In Congressional hearings, he also contributed economic analyses to oppose dams in the Grand Canyon, and instead he advocated for nuclear power as a cleaner, more environmentally sustainable, and cost-effective alternative to burning coal or oil. Moss approached this argument by claiming the dams in the Grand Canyon were not necessary for the economic success of the Central Arizona Project (CAP). He shared, “Those dams were not the key factors in subsidizing the Central Arizona Project. One, we did the calculation that the Bureau of Reclamation did and took out both the costs of and the revenue from the two Grand Canyon dams. At the end of the fifty-year period, you ended up with about the same amount of money with the Central Arizona Project subsidized as with the dams in the calculation.” By merging his analytical mind and engineering expertise, Moss played a key role in broadening the Sierra Club’s mission, helping shift its focus strictly from wildlife conservation to address the broader environmental challenges of his time.

The oral history of Laurence I. Moss offers testimony to the crucial role that engineering and technical expertise can play in creating a safer, more environmentally friendly future.  His integration of engineering and environmental protection inspires future generations of engineers like me, who hope to contribute to the sustainable engineering industry. Moss’s life, work, and advocacy emphasized deep interconnections between economics, engineering, and environmental action. He serves as a lasting source of inspiration for students and professionals who share his values in the ongoing pursuit for a healthier planet.

Laurance I. Moss, “Laurance I. Moss: Sierra Club President, 1973-1974, Nuclear Engineer: Energy and Environmental Policy,” interview conducted by Ann Lage in 1992, Oral History Center, The Bancroft Library, University of California, Berkeley, 2014.

ABOUT THE ORAL HISTORY CENTER

The Oral History Center of The Bancroft Library preserves voices of people from all walks of life, with varying political perspectives, national origins, and ethnic backgrounds. We are committed to open access and our oral histories and interpretive materials are available online at no cost to scholars and the public. You can find our oral histories from the search feature on our home page. Search by name, keyword, and several other criteria. Sign up for our monthly newsletter  featuring think pieces, new releases, podcasts, Q&As, and everything oral history. Access the most recent articles from our home page or go straight to our blog home.

Please consider making a tax-deductible donation to the Oral History Center if you’d like to see more work like this conducted and made freely available online. As a soft-money research unit of The Bancroft Library, the Oral History Center must raise outside funding to cover its operational costs for conducting, processing, and preserving its oral history work, including the salaries of its interviewers and staff, which are not covered by the university. You can give online, or contact us at ohc@berkeley.edu for more information about our funding needs for present and future projects.


A&H Data: Designing Visualizations in Google Maps

This map shows the locations of the bookstores, printers, and publishers in San Francisco in 1955 according to Polk’s Directory (SFPL link). The map highlights the quantity thereof as well as their centrality in the downtown. That number combined with location suggests that publishing was a thriving industry.

Using my 1955 publishing dataset in Google My Maps (https://www.google.com/maps/d) I have linked the directory addresses of those business categories with a contemporary street map and used different colors to highlight the different types. The contemporary street map allows people to get a sense of how the old data compares to what they know (if anything) about the modern city.

My initial Google My Map, however, was a bit hard to see because of the lack of contrast between my points as well as how they blended in with the base map. One of the things that I like to keep in mind when working with digital tools is that I can often change things. Here, I’m going to poke at and modify my

  • Base map
  • Point colors
  • Information panels
  • Sharing settings

My goal in doing so is to make the information I want to understand for my research more visible. I want, for example, to be able to easily differentiate between the 1955 publishing and printing houses versus booksellers. Here, contrasting against the above, is the map from the last post:

Image of the My Mpas backend with pins from the 1955-56 polk directory, colors indicating publishers or booksellers.
Click on the map for the last post in this series.

Quick Reminder About the Initial Map

To map data with geographic coordinates, one needs to head to a GIS program (US.gov discussion of). In part because I didn’t yet have the latitude and longitude coordinates filled in, I headed over to Google My Maps. I wrote about this last post, so I shan’t go into much detail. Briefly, those steps included:

    1. Logging into Google My Maps (https://www.google.com/maps/d/)
    2. Clicking the “Create a New Map” button
    3. Uploading the data as a CSV sheet (or attaching a Google Sheet)
    4. Naming the Map something relevant

Now that I have the map, I want to make the initial conclusions within my work from a couple weeks ago stand out. To do that, I logged back into My Maps and opened up the saved “Bay Area Publishers 1955.”

Base Map

One of the reasons that Google can provide My Maps at no direct charge is because of their advertising revenue. To create an effective visual, I want to be able to identify what information I have without losing my data among all the ads.

Grid of nine possible base maps for use in Google Maps. The small squares suggest different color balances and labels.
Screen capture from 2024 showing thumbnails for possible base map design.

To move in that direction, I head over to the My Map edit panel where there is a “Base map” option with a down arrow. Hitting that down arrow, I am presented with an option of nine different maps. What works for me at any given moment depends on the type of information I want my data paired with.

The default for Google Maps is a street map. That street map emphasizes business locations and roads in order to look for directions. Some of Google’s My Maps’ other options focus on geographic features, such as mountains or oceans. Because I’m interested in San Francisco publishing, I want a sense of the urban landscape and proximity. I don’t particularly need a map focused on ocean currents. What I do want is a street map with dimmer colors than Google’s standard base map so that my data layer is distinguishable from Google’s landmarks, stores, and other points of interest.

Nonetheless, when there are only nine maps available, I like to try them all. I love maps and enjoy seeing the different options, colors, and features, despite the fact that I already know these maps well.

The options that I’m actually considering are “Light Political” (option center left in the grid) “Mono City” (center of the grid) or “White Water” (bottom right). These base map options focus on that lighter-toned background I want, which allows my dataset points to stand clearly against them.

For me, “Light Political” is too pale. With white streets on light gray, the streets end up sinking into the background, losing some of the urban landscape that I’m interested in. The bright, light blue of the ocean also draws attention away from the city and toward the border, which is precisely what it wants to do as a political map.

I like “Mono City” better as it allows my points to pop against a pale background while the ocean doesn’t draw focus to the border.

Of these options, however, I’m going to go with the “White Water” street map. Here, the city is done up with various grays and oranges, warming the map in contrast to “Mono City.” The particular style also adds detail to some of the geographic landmarks, drawing attention to the city as a lived space. Consequently, even though the white water creeps me out a bit, this map gets closest to what I want in my research’s message. I also know that for this data set, I can arrange the map zoom to limit the amount of water displayed on the screen.

Point colors

Now that I’ve got my base map, I’m on to choosing point colors. I want them to reflect my main research interests, but I’ve also got to pick within the scope of the limited options that Google provides.

Google My Map 30 color options above grid of symbols one can use for data points across map.
Color choices and symbols one can use for points as of 2024.

I head over to the Edit/Data pane in the My Maps interface. There, I can “Style” the dataset. Specifically, I can tell the GIS program to color my markers by the information in any one of my columns. I could have points all colored by year (here, 1955) or state (California), rendering them monochromatic. I could go by latitude or name and individually select a color for each point. If I did that, I’d run up against Google’s limited, 30-color palette and end up with lots of random point colors before Google defaulted to coloring the rest gray.

What I choose here is the types of business, which are listed under the column labeled “section.”

In that column, I have publishers, printers, and three different types of booksellers:

  • Printers-Book and Commercial
  • Publishers
  • Books-Retail
  • Books-Second Hand
  • Books-Wholesale

To make these stand out nicely against my base map, I chose contrasting colors. After all, using contrasting colors can be an easy way to make one bit of information stand out against another.

In this situation, my chosen base map has quite a bit of light grays and oranges. Glancing at my handy color wheel, I can see purples are opposite the oranges. Looking at the purples in Google’s options, I choose a darker color to contrast the light map. That’s one down.

For the next, I want Publishers to compliment Printers but be a clearly separate category. To meet that goal, I picked a darker purply-blue shade.

Moving to Books-Retail, I want them to stand as a separate category from the Printers and Publishers. I want them to complement my purples and still stand out against the grays and oranges. To do that, I go for one of Google’s dark greens.

Looking at the last two categories, I don’t particularly care if people can immediately differentiate the second-hand or wholesale bookstores from the retail category. Having too many colors can also be distracting. To minimize clutter of message, I’m going to make all the bookstores the same color.

Pop-ups/ Information Dock

Google My Map editing popup showing rows from dataset as a form.
Example of editable data from data sheet row.

For this dataset, the pop-ups are not overly important. What matters for my argument here is the spread. Nonetheless, I want to be aware of what people will see if they click on my different data points.

[Citylights pop-up right]

In this shot, I have an example of what other people will see. Essentially, it’s all of the columns converted to a single-entry form. I can edit these if desired and—importantly—add things like latitude and longitude.

The easiest way to drop information from the pop-up is to delete the column from the data sheet and re-import the data.

Sharing

As I finish up my map, I need to decide whether I want to keep it private (the default) or share it. Some of my maps, I keep private because they’re lists of favorite restaurants or loosely planned vacations. For example, a sibling is planning on getting married in Cadiz in Spain, and I have a map tagging places I am considering for my travel itinerary.

Toggles toward the top in blue and a close button toward the bottom for saving changes.
“Share map” pop up with options for making a map available.

Here, in contrast, I want friends and fellow interested parties to be able to see it and find it. To make sure that’s possible, I clicked on “Share” above my layers. On the pop-up (as figured here) I switched the toggles to allow “Anyone with this link [to] view” and “Let others search for and find this map on the internet.” The latter, in theory, will permit people searching for 1955 publishing data in San Francisco to find my beautiful, high-contrast map.

Important: This is also where I can find the link to share the published version of the map. If I pull the link from the top of my window, I’d share the editable version. Be aware, however, that the editable and public versions look a pinch different. As embedded at the top of this post, the published version will not allow the viewer to edit the material and will have the sidebar for showing my information, as opposed to the edit view’s pop-ups.

Next steps

To see how those institutions sit in the 1950s world, I am inclined to see how those plots align across a 1950s San Francisco map. To do that, I’d need to find an appropriate map and add a layer under my dataset. At this time, however, Google Maps does not allow me to add image and/or map layers. So, in two weeks I’ll write about importing image layers into Esri’s ArcGIS.


Digital Archives and the DH Working Group on Nov. 4

To my delight, I can now announce that the next Digital Humanities Working Group at UC Berkeley is November 4 at 1pm in Doe Library, Room 223.

For the workshop, we have two amazing speakers for lightning talks. They are:

Danny Benett, MA Student in Folklore, will discuss the Berkeley folklore archive which is making ~500,000 folklore items digitally accessible.

Adrienne Serra, Digital Projects Archivist at The Bancroft Library, will demo an interactive map in ArcGIS allowing users to explore digital collections about the Spanish and Mexican Land grants in California.

We hope to see you there! Do consider signing up (link) as we order pizza and like to have loose numbers.

The UC Berkeley Digital Humanities Working Group is a research community founded to facilitate interdisciplinary conversations in digital humanities and cultural analytics. It is a welcoming and supportive community for all things digital humanities. Presenters Danny Benett, MA Student in Folklore, will discuss the Berkeley folklore archive which is making ~500,000 folklore items digitally accessible. Adrienne Serra, Digital Projects Archivist at The Bancroft Library, will demo an interactive map in ArcGIS allowing users to explore digital collections about the Spanish and Mexican Land grants in California. SIGN UP
Flyer with D-Lab and Data & Digital Scholarship’s Digital Humanities Working Group, November 4 @ 1pm session.

The UC Berkeley Digital Humanities Working Group is a research community founded to facilitate interdisciplinary conversations in digital humanities and cultural analytics. It is a welcoming and supportive community for all things digital humanities.

The event is co-sponsored by the D-Lab and Data & Digital Scholarship Services.