by Kate Tasker (Digital Archivist, Bancroft Library) and Jay Boncodin (Library Computer Infrastructure Services)
In July 2015, the Bancroft Library’s Digital Collections Unit received an intriguing message from UC Berkeley Assistant Professor Alex Saum-Pascual, who teaches in the Spanish and Portuguese Department and with the Berkeley Center for New Media. Alex and her colleague Élika Ortega, a Postdoctoral Researcher at the Institute for Digital Research in the Humanities at the University of Kansas, were working on a Digital Humanities project for electronic literature, and wanted to read a groundbreaking book they had checked out of the UC Berkeley Library. The book was written by Stuart Moulthrop and is titled Victory Garden. But this “book” was actually a hypertext novel (now a classic in the genre) — and it was stored on a 3.5” floppy disk from 1991.
Alex had heard about the digital forensics work happening at the Bancroft Library. She wanted to know if we could help her read the disk so she could use the novel in her new undergraduate course, and in a collaborative exhibit called “No Legacy: Literatura electrónica (NL || LE)” (opening March 11, 2016 in Doe Library). We were excited to help out with such a cool project, and enlisted Jay Boncodin (a retro-tech enthusiast) from Library Computer Infrastructure Services (LCIS) to investigate this “retro” technology.
The initial plan was to make a copy of the floppy, to be read and displayed on an old Mac Color Classic. The software booklet included with the floppy disk listed instructions for installing the “Storyspace” software on Macintosh computers, so this seemed like a good sign.
Our first step was to make an exact bit-for-bit forensic copy of the original disk as a back up, which would allow us to experiment without risking damage to the original. The Digital Collections Unit routinely works with the Library Systems Office to create disk images of older computer media like 3.5” floppies to support preservation and processing activities, so this was familiar territory. We created a disk image using the free FTK Imager program and an external 3.5” floppy disk drive with a USB connector.
Next, all we had to do was copy the disk image to a new blank 3.5” floppy, insert into a floppy disk drive, and voila! We’d have access to a recovered 25-year-old eBook.
…Except the floppy disk drive couldn’t read the floppy, on any of the Macs we tried.
We backed up a step to ensure that nothing had gone wrong with the disk imaging process. We were able to view a list of the floppy disk contents in the FTK Imager program, and extracted copies of the original files for good measure. The file list looked very much like one from a Windows-based program, and we also recognized an executable file (with the .exe extension) which would be familiar to anyone who’s ever downloaded and installed Windows software.
After double-checking the catalog record we realized that despite the documentation this “Macintosh” disk was actually formatted for PCs, so it definitely was not going to run on a Mac.
Switching gears (and operating systems), we then attempted to read the disk on a machine running Windows 7 – but encountered an error stating the program would only run on a 32-bit operating system. Luckily Jay had an image of a 32-bit Windows OS handy, so we could test it out.
It took a couple of tries, but we finally were able to read the disk on the 32-bit OS. Victory!
From there it was an eas(ier) task to copy the software files and install the program.
Jay scrounged up a motherboard with a 3.5” floppy drive controller and built a custom PC with an internal 3.5” floppy drive, which runs the 32-bit Windows operating system. After Victory Garden was installed on this PC, it was handed off to Dave Wong (LCIS) who conducted the finishing touches of locking down the operating system to ensure that visitors cannot tamper with it.
The machine has just been installed in the Brown Gallery of Doe Library and is housed in a beautiful wood enclosure designed by students in Stephanie F. Lie’s Fall 2016 seminar “New Media 290-003: Archive, Install, Restore” at the Berkeley Center for New Media.
Since one of the themes of the No Legacy: Literatura electrónica project is the challenge of electronic literature preservation, it seems fitting that the recovery of the floppy disk data was not exactly a straightforward process! Success depended on collaboration with people across the UC Berkeley campus, including Alex and her Digital Humanities team, the Berkeley Center for New Media, Library Computer Infrastructure Services, Doe Library staff, and the Bancroft Library’s Digital Collections Unit.
We’re excited to see the final display, and hope you’ll check it out yourself in the Bernice Layne Brown Gallery in Doe Library. The exhibit runs from March 11 – September 2, 2016.
For more information, visit http://nolegacy.berkeley.edu/.
By Kate Tasker and Julie Goldsmith, Bancroft Digital Collections
Last week in the Bancroft’s Digital Collections Unit, we put our new Tableau write blocker to work. Before processing a born-digital collection, a digital archivist must first be able to access and transfer data from the original storage media, often received as hard drives, optical disks and floppy disks. Floppy disks have a mechanism to physically prevent changes to the data during the transfer process, but data on hard drives and USB drives can be easily and irreversibly altered just by connecting the drive to a computer. We must access these drives in write-blocked (or read-only) mode to avoid altering the original metadata (e.g. creation dates, timestamps, and filenames). The original metadata is critical for maintaining the authenticity, security, contextual information, and research value of digital collections.
A write blocker is essentially a one-way street for data; it provides assurance that no changes were made, regardless of user error or software modification. For digital archives, using a write blocker ensures an untampered audit trail of changes that have occurred along the way, which is essential for answering questions about provenance, original order and chain of custody. As stewards of digital collections, we also have a responsibility to identify and restrict any personally identifying information (PII) about an individual (Social Security numbers, medical or financial information, etc.), which may be found on computer media. The protected chain of custody is seen as a safeguard for collections which hold these types of sensitive materials.
Other types of data which are protected by write-blocked transfers include configuration and log files which update automatically when a drive connects to a system. On a Windows formatted drive, the registry files can provide information associated with the user, like the last time they logged in and various other account details. Another example would be if you loaned someone a flash drive and they plugged it into their Mac; by doing so they can unintentionally update or install system file information onto the flash drive like a hidden .Spotlight-V100 file. (Spotlight is the desktop search utility on the Mac OS X, and the contents of this folder serve as an index of all files that were on the drive the last time it was used with a Mac.)
Write blockers also support fixity checks for digital preservation. We use software programs to calculate unique identifiers for every original file in a collection (referred to as cryptographic hash algorithms, or checksums, by digital preservationists). Once files have been copied, the same calculations are run on the files to generate another set of checksums. If they match that means that the digital objects are the same, bit for bit, as the originals, without any modification or data degradation.
Once we load the digital collection files in FTK Imager, a free lightweight version of the Forensic Tool Kit (FTK), a program that the FBI uses in criminal data investigations we can view the folders and files in the original file directory structure. We can also easily export a file directory listing, which is an inventory of all the files in the collection with their associated metadata. The file directory listing provides us with specific information about each file (filename, filepath, file size, date created, date accessed, date modified, and checksum) as well as a summary of the entire collection (total number of files, total file size, date range, and contents). It also helps us to make processing decisions, such as whether to capture the entire hard drive as a disk image, or whether to transfer selected folders and files as a logical copy.
Write blockers are also known in the digital forensics and digital preservation fields as Forensic Bridges. Our newest piece of equipment is already helping us bridge the gap between preserving original unprocessed computer media and creating open digital collections which are available to all.
For Further Reading:
AIMS Working Group. “AIMS Born-Digital Collections: An InterInstitutional Model for Stewardship.” 2012. http://www.digitalcurationservices.org/files/2013/02/AIMS_final_text.pdf
Gengenbach, Martin J. “‘The Way We Do it Here’: Mapping Digital Forensics Workflows in Collecting Institutions.” A Master’s Paper for the M.S. in L.S degree. August, 2012. http://digitalcurationexchange.org/system/files/gengenbach-forensic-workflows-2012.pdf
Kirschenbaum, Matthew G., Richard Ovenden, and Gabriela Redwine. “Digital Forensics and Born-Digital Content in Cultural Heritage Collections.” Washington, DC: Council on Library and Information Resources, 2010. http://www.clir.org/pubs/reports/pub149
BitCurator Project. http://bitcurator.net
Forensics Wiki. http://www.forensicswiki.org/