A&H Data: Bay Area Publishing and Structured Data

Last post, I promised to talk about using structured data with a dataset focused on 1950s Bay Area publishing. To get into that topic, I’m going to talk about 1) setting out with a research question as well as 2) data discovery, and 3) data organization, in order to do 4) initial mapping.

Background to my Research

When I moved to the Bay Area, I (your illustrious Literatures and Digital Humanities Librarian) started exploring UC Berkeley’s collections. I wandered through the Doe Library’s circulating collections and started talking to our Bancroft staff about the special library and archive’s foci. As expected, one of UC Berkeley’s collecting areas is California publishing, with a special emphasis on poetry.

Allen Ginsberg depicted with wings in copy for a promotional piece.
Mock-up of ad for books by Allen Ginsberg, City Lights Books Records, 1953-1970, Bancroft Library.

In fact, some of Bancroft’s oft-used materials are the City Light Books collections (link to finding aids in the Online Archive of California) that include some of Allen Ginsberg’s pre-publication drafts of “Howl” and original copies of Howl and Other Poems. You may already know about that poem because you like poetry, or because you watch everything with Daniel Radcliffe in it (IMDB on the 2013 Kill your Darlings). This is, after all, the very poem that led to the seminal trial that influenced U.S. free speech and obscenity laws (often called The Howl Obscenity Trial) . The Bancroft collections have quite a bit about that trial as well as some of Ginsberg’s correspondence with Lawrence Ferlinghetti (poet, bookstore owner, and publisher) during the harrowing legal case. (You can a 2001 discussion with Ferlinghetti on the subject here.)

Research Question

Interested in learning more about Bay Area publishing in general and the period in which Ginsberg’s book was written in particular, I decided to look into the Bay Area publishing environment during the 1950s and now (2020s), starting with the early period. I wanted a better sense of the environment in general as well as public access to books, pamphlets, and other printed material. In particular, I wanted to start with the number of publishers and where they were.

Data Discovery

For a non-digital, late 19th and 20th century era, one of the easiest places to start getting a sense of mainstream businesses is to look in city directories. There was a sweet spot in an era of mass printing and industrialization in which city directories were one of the most reliable sources of this kind of information, as the directory companies were dedicated to finding as much information as possible about what was in different urban areas and where men and businesses were located. The directories, as a guide to finding business, people, and places, were organized in a clear, columned text, highly standardized and structured in order to promote usability.

Raised in an era during which city directories were still a normal thing to have at home, I already knew these fat books existed. Correspondingly, I set forth to find copies of the directories from the 1950s when “Howl” first appeared. If I hadn’t already known, I might have reached out to my librarian to get suggestions (for you, that might be me).

I knew that some of the best places to find material like city directories were usually either a city library or a historical society. I could have gone straight to the San Francisco Public Library’s website to see if they had the directories, but I decided to go to Google (i.e., a giant web index) and search for (historic san francisco city directories). That search took me straight to the SFPL’s San Francisco City Directories Online (link here).

On the site, I selected the volumes I was interested in, starting with Polk’s Directory for 1955-56. The SFPL pages shot me over to the Internet Archive and I downloaded the volumes I wanted from there.

Once the directory was on my computer, I opened it and took a look through the “yellow pages” (i.e., pages with information sorted by business type) for “publishers.”

Page from a city directory with columns of company names and corresponding addresses.
Note the dense columns of text almost overlap. From R.L. Polk & Co, Polk’s San Francisco City Directory, vol. 1955–1956 (San Francisco, Calif. : R.L. Polk & Co., 1955), Internet Archive. | Public Domain.

Glancing through the listings, I noted that the records for “publishers” did not list City Light Books. Flipped back to “book sellers,” I found it. That meant that other booksellers could be publishers as well. And, regardless, those booksellers were spaces where an audience could acquire books (shocker!) and therefore relevant. Considering the issue, I also looked at the list for “printers,” in part to capture some of the self-publishing spaces.

I now had three structured lists from one directory with dozens of names. Yet, the distances within the book and inability to reorganize made them difficult to consider together. Furthermore, I couldn’t map them with the structure available in the directory. In order to do what I wanted with them (i.e., meet my research goals), I needed to transform them into a machine readable data set.

Creating a Data Set

Machine Readable

I started by doing a one-to-one copy. I took the three lists published in the directory and ran OCR across them in Adobe Acrobat Professional (UC Berkeley has a subscription; for OA access I recommend Transkribus or Tesseract), and then copied the relevant columns into a Word document.

Data Cleaning

The OCR copy of the list was a horrifying mess with misspellings, cut-off words, Ss understood as 8s, and more. Because this was a relatively small amount of data, I took the time to clean the text manually. Specifically, I corrected typos and then set up the text to work with in Excel (Google Sheets would have also worked) by:

  • creating line breaks between entries,
  • putting tabs between the name of each institution and corresponding address

Once I’d cleaned the data, I copied the text into Excel. The line breaks functioned to tell Excel where to break rows and the tabs where to understand columns. Meaning:

  • Each institution had its own row.
  • The names of the institutions and their addresses were in different columns.

Having that information in different spaces would allow me to sort the material either by address or back to its original organization by company name.

Adding Additional Information

I had, however, three different types of institutions—Booksellers, Printers, and Publishers—that I wanted to be able to keep separate. With that in mind, I added a column for EntryType (written as one word because many programs have issues with understanding column headers with spaces) and put the original directory headings into the relevant rows.

Knowing that I also wanted to map the data, I also added a column for “City” and another for “State” as the GIS (i.e., mapping) programs I planned to use wouldn’t automatically know which urban areas I meant. For these, I wrote the name of the city (i.e., “San Francisco”) and then the state (i.e., “California”) in their respective columns and autofilled the information.

Next, for record keeping purposes, I added columns for where I got the information, the page I got it from, and the URL for where I downloaded it. That information simultaneously served for me as a reminder but also as a pointer for anyone else who might want to look at the data and see the source directly.

I put in a column for Org/ID for later, comparative use (I’ll talk more about this one in a further post,) and then added columns for Latitude and Longitude for eventual use.

Page from a city directory with columns of company names and corresponding addresses.
The column headers here are: Years; Section; Company; Address; City; State; PhoneNumber; Latitude; Longitude; Org; Title; PageNumber; Repository; URL. Click on the chart to see the file.

Finally, I saved my data with a filename that I could easily use to find the data again. In this case, I named it “BayAreaPublishers1955.” I made sure to save the data as an Excel file (i.e., .xmlx) and Comma Separated Value file (i.e., .csv) for use and preservation respectively. I also uploaded the file into Google Drive as a Google Sheet so you could look at it.

Initial Mapping of the Data

With that clean dataset, I headed over to Google’s My Maps (mymaps.google.com) to see if my dataset looked good and didn’t show locations in Los Angeles or other spaces. I chose Google Maps for my test because it is one of the easiest GIS programs to use

  1. because many people are already used to the Google interface
  2. the program will look up latitude and longitude based on address
  3. it’s one of the most restrictive, meaning users don’t get overwhelmed with options.

Heading to the My Maps program, I created a “new” map by clicking the “Create a new map” icon in the upper, left hand corner of the interface.

From there, I uploaded my CSV file as a layer. Take a look at the resulting map:

Image of the My Mpas backend with pins from the 1955-56 polk directory, colors indicating publishers or booksellers.
Click on the map for an interactive version. Note that I’ve set the pins to differ in column by “type.”

The visualization highlights the centrality of the 1955 San Francisco publishing world, with its concentration of publishing companies and bookstores around Mission Street. Buying books also necessitated going downtown, but once there, there was a world of information at one’s fingertips.

Add in information gleaned from scholarship and other sources about book imports, custom houses, and post offices, and one can start to think about international book trades and how San Francisco was hooked into it.

I’ll talk more about how to use Google’s My Maps in the next post in two weeks!


A&H Data: What even is data in the Arts & Humanities?

This is the first of a multi-part series exploring the idea and use of data in the Arts & Humanities. For more information, check out the UC Berkeley Library’s Data and Digital Scholarship page.

Arts & Humanities researchers work with data constantly. But, what is it?

Part of the trick in talking about “data” in regards to the humanities is that we are already working with it. The books and letters (including the one below) one reads are data, as are the pictures we look at and the videos we watch. In short, arts and humanities researchers are already analyzing data for the essays, articles, and books that they write. Furthermore, the resulting scholarship is data.

For example, the letter below from Bancroft Library’s 1906 San Francisco Earthquake and Fire Digital Collection on Calisphere is data.

blue ink handwriting with sepia toned paper; semi-structuring seen in data, addressee, etc. organization

George Cooper Pardee, “Aid for San Francisco: Letter from the Mayor in Oregon,”
April 24, 1906, UC Berkeley, Bancroft Library on Calisphere.

 

One ends up with the question “what isn’t data?”

The broad nature of what “data” is means that instead of asking if something is data, it can be more useful to think about what kind of data one is working with. After all, scholars work with geographic information; metadata (e.g., data about data); publishing statistics; and photographs differently.

Another helpful question is to consider how structured it is. In particular, you should pay attention to whether the data is:

  • unstructured
  • semi-structured
  • structured

The level of structure informs us how to treat the data before we analyze it. If, for example, you have hundreds of of images, you want to work with, it’s likely you’ll have to do significant amount of work before you can analyze your data because most photographs are unstructured.

photograph of adorable ceramic hedgehog

For example, with this picture of a ceramic hedgehog, the adorable animal, the photograph, and the metadata for the photograph are all different kinds of data. Image: Zde, Ceramic Rhyton in the Form of a Hedgehog, 14. to 13. century BCE, Photograph, March 15, 2014, Wikimedia Commons. | Creative Commons Attribution-Share Alike 3.0 Unported.

 

In contrast, the letter toward the top of this post is semi-structured. It is laid out in a typical, physical letter style with information about who, where, when, and what was involved. Each piece of information, in turn, is placed in standardized locations for easy consumption and analysis. Still, to work with the letter and its fellows online, one would likely want to create a structured counterpart.

Finally, structured data is usually highly organized and, when online, often in machine-readable chart form. Here, for example, are two pages from the Polk San Francisco City Directory from 1955-1956 with a screenshot of the machine-readable chart from a CSV (comma separated value) file below it. This data is clearly structured in both forms. One could argue that they must be as the entire point of a directory is for easy of information access and reading. The latter, however, is the one that we can use in different programs on our computers.

Page from San Francisco city directory with columns listing businesses with their addresses.
Page from San Francisco city directory with columns listing businesses with their addresses.
Screenshot of excell sheet with publisher addresses in columns R.L. Polk & Co, Polk’s San Francisco City Directory, vol. 1955–1956 (San Francisco, Calif. : R.L. Polk & Co., 1955),
Internet Archive. | Public Domain.

 

This post has provided a quick look at what data is for the Arts&Humanities.

The next will be looking at what we can do with machine-readable, structured data sets like the publisher’s information. Stay tuned! The post should be up in two weeks.


50 Years in San Francisco’s Mission District: The Archives of Acción Latina

Photographic prints and posters from the archives of Acción Latina and El Tecolote newspaper are now available for research at Bancroft Library, with an online finding aid newly published at the Online Archive of California. This is the result of the dedicated work of Isabel Breskin, an intern in Library and Information Science at the University of Washington. Below we have Isabel’s reflections on the collection, along with snapshots of a few photographs encountered while she arranged and described the files. Organizational records and other materials from Acción Latina will be made available in the coming months. -JAE

A Guest Posting by Isabel Breskin

Acción Latina is a community organization based in San Francisco’s Mission District. The roots of the organization’s work go back to 1970, when San Francisco State University journalism professor Juan Gonzalez launched a newspaper with his students. That newspaper, El Tecolote, is still published bimonthly and is now the longest-running bilingual newspaper in the country. In 1982, volunteers from El Tecolote and New College of California staged the first Encuentro del Canto Popular, a festival celebrating Latin American music. The festival became an annual event; the 41st Encuentro was held in December 2022. 

The Acción Latina and El Tecolote Pictorial Archive contains thousands of photographs, hundreds of posters and artists’ prints, as well as negatives, slides, cartoons and other drawings, and digital images. The photographic print collection and the poster and artists’ print collection are now available to researchers. 

The photographs capture all aspects of life in the Mission beginning around 1970 and continuing into the first decade of the 21st century, as people took to the streets to protest and celebrate, as they went to work and school, played music and danced, painted murals and listened to poetry. I found the photographs of protests particularly compelling — and I think researchers will, too. They are both rich in information about the issues and causes of the times, and moving evidence of the passion and belief that stirred people to action.

Here are just a few snapshots I took as I worked to arrange and rehouse the photographs.

As I’ve been working on the collection I’ve been thinking about all the people involved: the many people who have been part of Acción Latina over the decades, who have lived and worked in the Mission District and have contributed to the vibrancy of its community, the photographers and artists who created these materials, and the people who will now turn to the images and learn from them.

We recently had our first researcher come to use the newly available collection. He was interested in Bay Area events related to the politics and culture of Chile. Among the relevant images in the collection is this photograph.

Protest photograph from the Acción Latina archive: woman with sign placard for human rights in Chile

I am struck by the look on this unknown woman’s face – she looks both tragic and absolutely determined. It is meaningful to me that her decision to go out and protest that day is being preserved in the collection, and is being recognized and honored in the work of scholars.


The Berkeley Remix: Season 3 of the Oral History Center’s Podcast

Listen to Season 3 Episode 1 of our podcast now known as The Berkeley Remix.

First Response: AIDS and Community in San Francisco

This podcast is about the politics of the first encounters with the AIDS epidemic in San Francisco. The six episodes draw from the thirty-five interviews that Sally Smith Hughes conducted in the 1990s. A historian of science at UC Berkeley’s Oral History Office, Sally interviewed doctors, nurses, researchers, public health officials and community-health practitioners to learn about the unique ways that people responded to the epidemic. Although these interviews cover a wide range of topics, including the isolation of the virus HIV and the search for treatments, the interviews we selected for this podcast are more focused on public health, community engagement, and nursing care. Most of the following podcast episodes are about the period from early 1981, when the first reports emerged of an unknown disease that was killing gay men in San Francisco, to 1984 and the development of a new way of caring for people in a hospital setting.

Episode 1 explores what it was like to be gay in San Francisco in the 1960s and 70s, before people became aware of the epidemic.

Visit The Berkeley Remix for release of Episodes 2-6 each Wednesday.


Announcing the Release of the California / San Francisco Fire Departments Oral History Project

The world of firefighting is much more than masked people in uniforms running into burning buildings and rescuing scared cats from trees. While the bravery of firefighters can’t be overestimated, they also work in a complex system that requires constant training and education, a cohesive partnership with local government, extensive procedures and protocols, managerial oversight, effective communication within departments and to the public, acute familiarity with the local and regional environment, and a whole lot of administrative work. The San Francisco Fire Department (SFFD) is a shining example of how people make a civil service operation run and keep people safe. All of these elements, as well as the historic and cultural aspects of the department, are why we chose it as our focus for our California Fire Departments Oral History Project.

The project was originally conceived by Sarah Wheelock, an independent researcher. She wanted to explore several major thematic areas of firefighting in California and she worked with the Oral History Center to do just that. With great sadness we learned that Sarah passed away in 2014 and thus she was unable to see the project through to completion. Taking over the project in 2016, I wanted to honor her original plan and cover the themes that she had outlined. So, I decided to embark on interviews within one department – the SFFD – to document the ways in which they have handled urban fire, climate change, diversity, technological change, and changing demographics.

The SFFD was founded in 1849 and was run by volunteers. It became a paid department, officially integrated into city government, in 1866. The 150th anniversary of the paid department was in 2016, when I was conducting interviews. Given my budget for the project, I was able to interview six people who worked with the SFFD in different capacities. I wanted to include multiple perspectives to understand the organizational, cultural, geographic, economic, and political systems of one of the oldest departments in the country.

The individuals who I interviewed were able to illustrate many of the themes that I wanted to document, and much more. Among the six people I interviewed were Chief Robert Demmons (the first and only African American chief of the SFFD who instrumental in integrating more more women and people of color into the SFFD), Bill Koenig (longtime firefighter and co-founder of Guardians of the City and the SFFD Museum), Jim Lee (also a longtime firefighter and co-founder of Guardians of the City and the SFFD Museum), Steve Nakajo (member of the SFFD Fire Commission), Lt. Anne Young (one of the first females hired), and Jonathan Baxter (longtime paramedic and current Public Information Officer). 

These interviews work in concert to illustrate day-to-day operations in the stations, administrative duties, how the city of San Francisco and the department work together, the relationship between paramedics and the department, training, equipment, fire science school, the role of unions, the challenges and triumphs of integrating the departments, the public perception of the department, the role of innovation and changing technology, cultural changes in the department, challenges in fire safety particular to the geography of San Francisco, and the hopes for the future of the SFFD.

It is with great excitement that we present the California / San Francisco Fire Departments Oral History Project. I want to give a special thanks to all of the narrators for sharing their stories with me and helping me to document one of the most historically significant fire departments in our country.

This project is dedicated to the memory of Sarah Wheelock. Her California Firefighter oral histories from the 2000s will be released in early 2018. 


Trial: San Francisco Chronicle 1869-1984

The Library has a trial for the NewsBank digital archive of the San Francisco Chronicle, covering 1869-1984. This includes 61 years not covered by our purchase of the ProQuest digitized San Francisco Chronicle.

You can access the paper until November 9 through this link:

http://infoweb.newsbank.com/?db=EANX-NB&s_browseRef=decades/142051F45F422A02/all.xml

Please send your feedback to me at dorner@berkeley.edu.


Event: Visualizing History: Mapping the 1915 San Francisco World’s Fair

Join Bancroft Library in celebrating the 100th anniversary of the 1915 San Francisco Panama-Pacific International Exhibition by digitally mapping rarely seen photographs of the world’s fair onto a historic map of the fairgrounds using the Historypin platform.

The event will kick off with a gallery tour by Curator Theresa Salazar of the Bancroft Library’s PPIE exhibit: The Grandeur of a Great Labor: The Building of the Panama Canal and the Panama-Pacific International Exposition, followed by a reception and brief talk in the beautiful Morrison Library.

Participants will then work together to explore archival images of the world’s fair and try to pinpoint the exact locations where the photos were taken. Using maps and guides as you would have 100 years ago, you’ll virtually find your way through “The Zone” and its sometimes-fatal carnival rides, wander through the massive exhibit halls, and marvel at the architecture of the state and country pavilions. History “pinners” will see their results live on the Historypin PPIE site at the event. We guarantee you’ll never see SF’s Marina neighborhood the same way again!

Thursday, November 19, 2015

4:00-4:30 pm – Location: Bancroft Library Gallery
Bancroft Library Gallery Tour with Curator Theresa Salazar – meet in the Bancroft Library lobby (following the gallery tour, participants will be escorted to the Morrison Library for the remainder of the event)

4:30-7:00 pm – Location: Morrison Library
Welcome Reception and Talk by Laura Ackley, author of San Francisco’s Jewel City: The Panama-Pacific International Exposition of 1915.
Demonstration and Pinathon
After a quick tour of the virtual fairgrounds, you’ll have a chance to get hands-on working in groups to help us pin historic images from Bancroft’s collections onto the 1915 fairground map, using clues, fair guides, maps, and more.

Live sharing on Historypin PPIE Site
We will have groups share some of the just-pinned materials Live on the Historypin PPIE site—Tell us what you discovered in your time travels!


Event: Bancroft Roundtable: “Before the PPIE: The Mechanics’ Institute and the Development of San Francisco’s ‘Fair Culture,’ 1857-1909.”

Please join us for the second Bancroft Library Roundtable of the fall semester!

It will take place in the Lewis-Latimer Room of The Faculty Club at noon on Thursday, October 15. Taryn Edwards, Librarian/Historian in the Mechanics’ Institute Library and Chess Room, San Francisco, will present “Before the PPIE: The Mechanics’ Institute and the Development of San Francisco’s ‘Fair Culture,’ 1857-1909.”

Between the years of 1857 and 1899, the Mechanics’ Institute hosted thirty-one industrial expositions that displayed and promoted the products of local entrepreneurs and inventors. These expositions bolstered California’s infant economy, encouraged the demand for local goods, and whetted the public’s appetite for elaborate, multi-attraction fairs. Given the Mechanics’ Institute’s vast experience with putting on such spectacles, its members were involved as consultants on larger state-wide fairs including the California Midwinter Fair of 1894, the Golden Jubilee Mining Fair of 1898, the Portola Festival of 1909, and the Panama Pacific International Exposition in 1915. Ms. Edwards will explore this rich history through a lecture and slideshow.

We hope to see you there.
Crystal Miles and Kathi Neal Bancroft Library Staff


Event: Bancroft Roundtable: “‘The World’s Best Working Climate’: Modeling Industrial Suburbs on the Edge of San Francisco Bay.”

The last Bancroft Roundtable of the spring semester will take place in the Lewis-Latimer Room of The Faculty Club at noon on Thursday, May 21. Peter Ekman, Bancroft Library Study Award recipient and doctoral candidate in geography at UC Berkeley, will present “‘The World’s Best Working Climate’: Modeling Industrial Suburbs on the Edge of San Francisco Bay.”

Between 1880 and 1940, urban manufacturers, planners, and property developers configured a series of company towns and industrial suburbs just east of San Francisco Bay, stretching from Richmond to Antioch on the shores of the Carquinez Strait. Drawing on visual materials and numerous manuscript collections at the Bancroft, Peter Ekman will discuss how this unfashionable, ostensibly unplanned “middle landscape” came, over time, to serve as a kind of laboratory for new, imitable models of social and spatial order. He will place these experiments within a prehistory, intellectual and material-cultural, of the postwar suburb, and explore their afterlives amid decades of disinvestment.

We hope to see you there.

Crystal Miles, Kathi Neal, and Baiba Strads
Bancroft Library Staff


Event: Bancroft Roundtable: “Counter-institutions are the answer, man!” Multi-Ethnic Publishing in the San Francisco Bay Area in the 1970s.

The next Bancroft Roundtable will take place in the Lewis-Latimer Room of The Faculty Club at noon on Thursday, March 19. Simon Abramowitsch, Bancroft Library Study Award recipient and doctoral candidate in English at UC Davis, will present “Counter-institutions are the answer, man!” Multi-Ethnic Publishing in the San Francisco Bay Area in the 1970s.

In the 1970s, independent publishing in the San Francisco Bay Area was central to the development of multi-ethnic American literature. Writers, editors, and publishers of literary journals and small presses made space for literature by African American, Asian American, Latina/o, Native American writers as well as European American writing outside the mainstream. But more than simply efforts to present work by and for single ethnic groups, the development of multi-ethnic literature in the Bay Area suggested and argued for a properly multi-cultural American literature. Ishmael Reed and Al Young’s Yardbird is frequently cited as the exemplar of this movement for the multi-culture, but Yardbird was in fact only one instance of a diverse and complex range of regional efforts in this direction. The talk will discuss the history of this local literary activity by looking at some of the figures and publishing efforts in the Bay Area during the 1970s.

Kathi Neal and Baiba Strads Bancroft Library Staff