Tag: Internet Archive
Library Leaders Forum 2016
On October 26-28, I had the honor of attending the Library Leaders Forum 2016, which was held at the Internet Archive (IA). This year’s meeting was geared towards envisioning the library of 2020. October 26th was also IA’s 20th anniversary. I joined my Web Science and Digital Libraries (WS-DL) Research Group in celebrating IA’s 20 years of preservation by contributing a blog post with my own personal story, which highlights a side of the importance of Web preservation for the Egyptian Revolution. More personal stories about Web archiving exist on WS-DL blog.
In the Great room at the Internet Archive Brewster Kahle, the Internet Archive’s Founder, kicked off the first day by welcoming the attendees. He began by highlighting the importance of openness, sharing, and collaboration for the next generation. During his speech he raised an important question, “How do we support datasets, the software that come with it, and open access materials?” According to Kahle, the advancement of digital libraries requires collaboration.
After Brewster Kahle’s brief introduction, Wendy Hanamura, the Internet Archive’s Director of Partnership, highlighted parts of the schedule and presented the rules of engagement and communication:
- The rule of 1 – Ask one question answer one question.
- The rule of n – If you are in a group of n people, speak 1/n of the time.
Before giving the microphone to the attendees for their introductions, Hanamura gave a piece of advice, “be honest and bold and take risks“. She then informed the audience that “The Golden Floppy” award shall be given to the attendees who would share bold or honest statements.
Next was our chance to get to know each other through self-introductions. We were supposed to talk about who we are, where we are from and finally, what we want from this meeting or from life itself. The challenge was to do this in four words.
After the introductions, Sylvain Belanger, the Director of Preservation of Library and Archives in Canada, talked about where his organization will be heading in 2020. He mentioned the physical side of the work they do in Canada to show the challenges they experience. They store, preserve, and circulate over 20 million books, 3 million maps, 90,000 films, and 500 sheets of music.
“We cannot do this alone!” Belanger exclaimed. He emphasized how important a partnership is to advance the library field. He mentioned that the Library and Archives in Canada is looking to enhance preservation and access as well as looking for partnerships. They would also like to introduce the idea of innovation into the mindset of their employees. According to Belanger, the Archives’ vision for the year 2020 includes consolidating their expertise as much as they can and also getting to know how do people do their work for digitization and Web archiving.
After the Belanger’s talk, we split up into groups of three to meet other people we didn’t know so that we could exchange knowledge about what we do and where we came from. Then the groups of two will join to form a group of six that will exchange their visions, challenges, and opportunities. Most of the attendees agreed on the need for growth and accessibility of digitized materials. Some of the challenges were funding, ego, power, culture, etc.
Our visions for 2020, challenges, opportunities – vision: growth & accessibility #libraryleaders2016 @internetarchive pic.twitter.com/ePkpKzvRGB
— Yasmina Anwar (@yasmina_anwar) October 27, 2016
Chris Edward, the Head of Digital Services at the Getty Research Institute, talked about what they are doing, where they are going, and the impact of their partnership with the IA. Edward mentioned that the uploads by the IA are harvested by HathiTrust and the Defense Logistics Agency (DLA). This allows them to distribute their materials. Their vision for 2020 is to continue working with the IA and expanding the Getty research portal, and digitize everything they have and make it available for everyone, anywhere, all the time. They also intend on automating metadata generation (OCR, image recognition, object recognition, etc.), making archival collections accessible, and doing 3D digitization of architectural models. They will then join forces with the International Image Interoperability Framework (IIIF) community to develop the capability to represent these objects. He also added that they want to help the people who do not have the ability to do it on their own.
After lunch, Wendy Hanamura walked us quickly through the Archive’s strategic plan for 2015-2020 and IA’s tools and projects. Some of these plans are:
- Next generation Wayback Machine
- Test pilot with Mozilla so they suggest archived pages for the 404
- Wikimedia link rots
- Building libraries together
- The 20 million books
- Table top scribe
- Open library and discovery tool
- Digitization supercenter
- Collaborative circulation system
- Television Archive — Political ads
- Software and emulation
- Proprietary code
- Scientific data and Journals – Sharing data
- Music — 78’s
“No book should be digitized twice!”, this is how Wendy Hanamura ended her talk.
Then we had a chance to put our hands on the new tools by the IA and by their partners through having multiple makers’ space stations. There were plenty of interesting projects, but I focused on the International Research Data Commons– by Karissa McKelvey and Max Ogden from the Dat Project. Dat is a grant-funded project, which introduces open source tools to manage, share, publish, browse, and download research datasets. Dat supports peer-to-peer distribution system, (e.g., BitTorrent). Ogden mentioned that their goal is to generate a tool for data management that is as easy as Dropbox and also has a versioning control system like GIT.
After a break Jeffrey Mackie-Mason, the University Librarian of UC Berkeley, interviewed Brewster Kahle about the future of libraries and online knowledge. The discussion focused on many interesting issues, such as copyrights, digitization, prioritization of archiving materials, cost of preservation, avoiding duplication, accessibility and scale, IA’s plans to improve the Wayback Machine and many other important issues related to digitization and preservation. At the end of the interview, Kahle announced his white paper, which wrote entitled “Transforming Our Libraries into Digital Libraries”, and solicited feedback and suggestions from the audience.
"Dark Archive is one of the WORST ideas ever" @brewster_kahle @internetarchive #libraryleaders2016
— Yasmina Anwar (@yasmina_anwar) October 27, 2016
@waybackmachine is the calling card of the @internetarchive and it has been neglected until recently #libraryleaders2016
— Merrilee Proffitt (@MerrileeIAm) October 27, 2016
"Dark Archive is one of the WORST ideas ever" @brewster_kahle @internetarchive #libraryleaders2016
— Yasmina Anwar (@yasmina_anwar) October 27, 2016
@waybackmachine is the calling card of the @internetarchive and it has been neglected until recently #libraryleaders2016
— Merrilee Proffitt (@MerrileeIAm) October 27, 2016
https://twitter.com/tripofmice/status/791790807736946688
https://twitter.com/tripofmice/status/791786514497671168
Its not what we can get away with, but rather what is the role of libraries in #copyright issues -Brewster K. #libraryleaders2016
— Dr.EB 🇵🇸 (@LNBel) October 27, 2016
At the end of the day, we had an unusual and creative group photo by the great photographer Brad Shirakawa who climbed out on a narrow plank high above the crowd to take our picture.
On day two the first session I attended was a keynote address by Brewster Kahle about his vision for the Internet Archive’s Library of 2020, and what that might mean for all libraries.
"A library is engine for research!!" #libraryleaders2016 pic.twitter.com/D2du0L67T2
— Yasmina Anwar (@yasmina_anwar) October 28, 2016
Heather Christenson, the Program Officer for HathiTrust, talked about where HeathiTrust is heading in 2020. Christenson started by briefly explaining what is HathiTrust and why HathiTrust is important for libraries. Christenson said that HathiTrust’s primary mission is preserving for print and digital collections, improving discovery and access through offering text search and bibliographic data APIs, and generating a comprehensive collection of the US federal documents. Christensen mentioned that they did a survey about their membership and found that people want them to focus on books, videos, and text materials.
Our next session was a panel discussion about the Legal Strategies Practices for libraries by Michelle Wu, the Associate Dean for Library Services and Professor of Law at the Georgetown University Law Center, and Lila Bailey, the Internet Archive’s Outside Legal Counsel. Both speakers shared real-world examples and practices. They mentioned that the law has never been clearer and it has not been safer about digitizing, but the question is about access. They advised the libraries to know the practical steps before going to the institutional council. “Do your homework before you go. Show the usefulness of your work, and have a plan for why you will digitize, how you will distribute, and what you will do with the takedown request.”
After the panel Tom Rieger, the Manager of Digitization Services Section at the Library of Congress (LOC), discussed the 2020 vision for the Library of Congress. Reiger spoke of the LOC’s 2020 strategic plan. He mentioned that their primary mission is to serve the members of Congress, the people in the USA, and the researchers all over the world by providing access to collections and information that can assist them in decision making. To achieve their mission the LOC plans to collect and preserve the born digital materials and provide access to these materials, as well as providing services to people for accessing these materials. They will also migrate all the formats to an easily manageable system and will actively engage in collaboration with many different institutions to empowering the library system, and adapt new methods for fulfilling their mission.
In the evening, there were different workshops about tools and APIs that IA and their partners provided. I was interested in the RDM workshop by Max Ogden and Roger Macdonald. I wanted to explore the ways we can support and integrate this project into the UC Berkeley system. I gained more information about how the DAT project worked through live demo by Ogden. We also learned about the partnership between the Dat Project and the Internet Archive to start storing scientific data and journals at scale.
We then formed into small groups around different topics on our field to discuss what challenges we face and generate a roadmap for the future. I joined the “Long-Term Storage for Research Data Management” group to discuss what the challenges and visions of storing research data and what should libraries and archives do to make research data more useful. We started by introducing ourselves. We had Jefferson Bailey from the Internet Archive, Max Ogden, Karissa from the DAT project, Drew Winget from Stanford libraries, Polina Ilieva from the University of California San Francisco (UCSF), and myself, Yasmin AlNoamany.
Some of the issues and big-picture questions that were addressed during our meeting:
- The long-term storage for the data and what preservation means to researchers.
- What is the threshold for reproducibility?
- What do researchers think about preservation? Does it mean 5 years, 15 years, etc.?
- What is considered as a dataset? Harvard considers anything/any file that can be interpreted as a dataset.
- Do librarians have to understand the data to be able to preserve it?
- What is the difference between storage and preservation? Data can be stored, but long-term preservation needs metadata.
- Do we have to preserve everything? If we open it to the public to deposit their huge datasets, this may result in noise. For the huge datasets what should be preserved and what should not?
- Privacy and legal issues about the data.
Principles of solutions
- We need to teach researchers how to generate metadata and the metadata should be simple and standardized.
- Everything that is related to research reproducibility is important to be preserved.
- Assigning DOIs to datasets is important.
- Secondary research – taking two datasets and combine them to produce something new. In digital humanities, many researchers use old datasets.
- There is a need to fix the 404 links for datasets.
- There is should be an easy way to share data between different institutions.
- Archives should have rules for the metadata that describe the dataset the researchers share.
- The network should be neutral.
- Everyone should be able to host a data.
- Versioning is important.
Notes from the other Listening posts:
- LIBRARY 2020: Refining the vision, mapping the collection, and identifying the contributors
- WEB ARCHIVING: What are the opportunities for collaborative technology building?
- DIGITIZATION: Scanning services–develop a list of key improvements and innovations you desire
- DISCOVERY: Open Library– ideas for moving forward
At the end of the day, Polina Ilieva, the Head of Archives and Special Collections at UCSF, wrapped up the meeting by giving her insight and advice. She mentioned that for accomplishing their 2020 goals and vision, there is a need to collaborate and work together. Ilieva said that the collections should be available and accessible for researchers and everyone, but there is a challenge of assessing who is using these collections and how to quantify the benefits of making these collections available. She announced that they would donate all their microfilms to the Internet Archive! “Let us all work together to build a digital library, serve users, and attract consumers. Library is not only the engine for search, but also an engine for change, let us move forward!” This is how Ilieva ended her speech.
It was an amazing experience to hear about the 2020 vision of the libraries and be among all of the esteemed library leaders I have met. I returned with inspiration and enthusiasm for being a part of this mission and also ideas for collaboration to advance the library mission and serve more people.
–Yasmin AlNoamany