The staff at the Internet Archive is fond of Raiders of the Lost Ark, thanks in no small part to the film’s famous final scene. A lowly worker wheeling a crate — which just happens to contain the powerful and face-melting ark of the covenant — through an seemingly endless warehouse that contains the all the wonders of the world.
It’s a fitting analogy for the Internet Archive, Wendy Hanamura, director of partnerships for the archive tells me. They’ve spent almost 20 years building an expansive repository of web pages, books, TV, and software. And now, they want to turn the doors open and make it easier for everyone to get involved.
“In some ways, we’re opening up that vast warehouse and saying here are the tools to bring together really meaningful collections and expose them to the world,” said Hanamura.
The Internet Archive is one of 22 projects receiving funding from Knight Foundation through the Knight News Challenge, which is awarding $3 million towards projects the provide new tools and ideas for making libraries more accessible. The Internet Archive will get $600,000 to develop new technology to give users more control over how materials are uploaded, categorized, and curated in the archive. [Disclosure: Knight is a funder of Nieman Lab, though not through the News Challenge.]
Right now, the archive holds around 20 petabytes of data, including 500,000 pieces of software, more than 2 million books, 3 million hours of TV, and 430 billion web pages. In a single day, they digitize more than 1,000 books. They capture TV 24 hours a day. In a week, they save more than 1 billion URLs.
As of 2013, only 8 percent of the archive was uploaded by users, some 53,000 people who have accounts with the archive. In order to continue the work of creating “universal access to all knowledge,” as is the archive’s mission, they want to get as many people working on the project as possible.
“To be one of the digital libraries of the future, we’re talking at a scope so far beyond a traditional library you could never have enough reference librarians to do the curatorial work,” Hanamura said.
As Jill Lepore pointed out in a recent profile in The New Yorker, the Internet Archive is doing the crucial work of keeping the web from disappearing:
A footnote used to say, “Here is how I know this and where I found it.” A footnote that’s a link says, “Here is what I used to know and where I once found it, but chances are it’s not there anymore.” It doesn’t matter whether footnotes are your stock-in-trade. Everybody’s in a pinch. Citing a Web page as the source for something you know — using a URL as evidence — is ubiquitous. Many people find themselves doing it three or four times before breakfast and five times more before lunch. What happens when your evidence vanishes by dinnertime?
What the Internet Archive wants to do is not unlike Wikipedia’s reliance on a community of editors to help keep the encyclopedia current. In order to do that, the archive needs to make it easier for people to upload content and help create descriptions, and catalog metadata that will help others find items.
“We’re going to allow people from across the world to come together to create and curate these collections,” said Hanamura.
Joining and helping the Internet Archive is not a simple task, says Alexis Rossi, the archive’s director of web services. In order to start making a collection you have to email the archive’s customer service department, she said. What they plan to do with the funding from Knight is create a simpler upload system that works across any browser, a contributor management system that lets one or many people work on collections, expanded search functions, and improved tools for organizing what material can be added to certain collections.
For example, a collection of music from the Grateful Dead has rules for how content is added, with specific metadata on the date, venue, and file quality of sound recordings. Rossi said it needs to be easier for individual groups to manage those collections and rules on their own without help from archive staff.
By far the most difficult part of the archive to update is websites, Rossi told me. The archive relies on web crawlers to collect pages, but it becomes more difficult as sites incorporate more multimedia or applications that are difficult to capture automatically. That creates broken or incomplete pages, which take time to fix by hand, she said. Another growing obstacle: pages that are blocked by web crawlers, either by files that don’t want to be recorded or by sites like Facebook that have password protection or other privacy controls.
But the point of having users involved in growing the archives collections isn’t just to help the staff, but to rely on the legions of expertise that exists across the world. It ultimately makes for a richer, smarter collection, she said. “I don’t think we or anyone are in any position to tell anyone what is important now, or what will be important 100 years from now,” Rossi said.
Here are the rest of the winners in the Knight News Challenge on libraries, which includes 14 projects receiving awards through the Prototype Fund, which invests $35,000 to early stage media and information projects.
Activating the Public Library
Organization: Peer 2 Peer University
Project leads: Philipp Schmidt and Carl Ruppin
Twitter: @chipublib, @p2pu
Online courses can offer people free access to useful content and knowledge, as well as education opportunities. However, the lack of peer-support and face-to-face learning options are often a barrier to successful participation, especially for newcomers. To address this issue and increase access to information and education, Peer 2 Peer University, with Chicago Public Library, will organize in-person study groups for patrons to support their free online learning experience. The project will assist participants through the library’s familiar resources and leverage Peer 2 Peer University’s experience in building online communities; it will also include reinforcement and feedback from fellow learners. Study groups will be held in local branch libraries.
Culture in Transit
Organization: Metropolitan New York Library Council
Project lead: Anne Karle-Zenith
Many communities are excluded from the nation’s digital cultural memory because they lack the equipment and technical support to contribute their history to local and national archives. To bridge that gap, two New York public library systems and a citywide libraries and archives membership organization will create a mobile kit, with scanners and cameras, that libraries will take to city branches, so that residents are able to record their historical items. Each item will not only be housed in local digital archives but in the Digital Public Library of America and other large-scale initiatives such as Internet Archive, providing worldwide access to local treasures. While these events provide educational and community-building opportunities, they also democratize the process of history-making — allowing people to contribute to and help define their local history. The partners include the Metropolitan New York Library Council (METRO), the Brooklyn Public Library, and the Queens Library, which designed the project together after meeting online via the newschallenge.org
The Internet Archive
Organization: Internet Archive
Project lead: Alexis Rossi and Brewster Kahle
Twitter: @internetarchive, @alexisrossi, @brewster_kahle
The Internet Archive is one of the world’s largest public digital libraries, with an extensive collection of human culture: 2 million books, 430 billion Web pages, 3 million hours of television and more. However, the archive’s users upload only a small percentage of these materials and to preserve the world’s knowledge the public should be encouraged to contribute. The archive is embarking on a project to make the archive.org site more community-driven by improving the tools that allow people to upload, describe and organize items. With these new tools, the Internet Archive hopes to democratize knowledge by giving global communities the ability to save, manage and share their cultural treasures for free. What Wikimedia did for encyclopedia articles, the Internet Archive hopes to do for collections of media: give people the tools to build library collections together and make them accessible to everyone.
The Library Freedom Project
Project lead: Alison Macrina
Private companies and the government increasingly control a large part of online communications, and as a result, society is facing a new set of challenges around privacy, surveillance, censorship and free speech. As stewards of information and providers of Internet access, librarians are in a prime position to educate patrons about their digital rights. The Library Freedom Project aims to make real the promise of intellectual freedom in libraries by bringing together a coalition of librarians, technology experts and lawyers to scale a series of privacy workshops for librarians. The workshops will provide librarians and their patrons with tools and information to better understand technology, privacy and law related to use of the Internet.
Digital Library for the Developing World
Organization: Library for All: Digital Library for the Developing World
Project leads: Rebecca MacDonald, Tanyella Evans and Isabel Sheinman
Many developing countries continue to struggle with limited access to information and educational resources, leading to challenges with literacy, civic participation, knowledge building and progress. To address this issue, Library for All will expand its Digital Library, uniquely designed to work in low-bandwidth environments, making educational content available for libraries and schools across the developing world. The platform will leverage mobile technology and be accessible on all devices, including low-cost tablets and $30 feature phones. Content will be culturally relevant and available in local languages.
Measure the Future
Organization: Evenly Distributed
Project lead: Jason Griffey
Twitter: @measure_future, @griffey
While libraries in recent years have created makerspaces to provide access to open technology, this project will help libraries use open hardware devices to improve their own services. Librarian Jason Griffey, a former Knight Foundation Prototype Fund grantee, will train librarians to use open source hardware to better understand the library building itself, one of a branch’s most important assets. The hardware will measure a variety of factors in each room, so that libraries can make better, data-driven decisions on how to use their public spaces.
Open Data to Open Knowledge
Organization: City of Boston
Project lead: Jascha Franklin-Hodge
Boston, like many cities, has published a collection of “open data” that includes everything from building permits to a list of urban farms. The data tells a story about government and city life. Public interest in the data is clear — with more than 14,000 views so far of neighborhood pothole data alone. But as with many forms of knowledge, making something available isn’t the same as making it useful. Through this project the city of Boston will work with local libraries to create a digital data catalog that will make it easier for residents, researchers and public employees to navigate. Once developed, the city and libraries will work together to introduce people to the resource, through, for example, introductory classes or data challenges where people are encouraged to analyze and visualize the data.
Organization: New York Public Library
Project lead: Matthew Knutzen, David Riordan, Ben Vershbow
Twitter: @NYPLMaps @nypl_labs
What if we could search a city’s past as easily as we search its present? What if we could explore forgotten neighborhoods, look up long-ago vanished buildings and streets, and discover the history around us? Libraries hold the records of evolving urban landscapes, but historic data that charts these changes is not easily accessible. To increase access and public exploration of this data, the New York Public Library will create a free, historical mapping service: the NYC Space/Time Directory. Using data from the library’s map collections and other sources, the directory will be a searchable, digital atlas and database of historical places, allowing scholars, students, journalists and enthusiasts to explore the city across time periods. It will be open source and community-built, engaging local museums, historical societies, universities, citizen cartographers and the New York tech community to help gather data, and to contribute code and expertise.
Prototype Fund winners
Anti-censorship Alert System by Center for Rights (Boston; project lead: Tiffiny Cheng, @fightfortheftr) allowing the public to see a blocked website by launching a series of tools, including an index and shareable website widgets, that enable the distribution and decentralization needed to provide local access to proxies and mirrored versions of the sites.
BklynShare by Brooklyn Public Library (New York; project lead: Michael Fieni; Twitter: @bklynlibrary): Enabling people to learn new skills through a service that connects knowledge seekers with experts in their own neighborhood
Book a Nook by Harvard University metaLAB (Boston; project lead: Jeffrey Schnapp; Twitter: @metalabharvard, @berkmancenter, @jaytiesse): Activating library public spaces for diverse community uses by testing a software toolkit that streamlines the exploration and reservation of physical library spaces.
The Community Resource Lab by District of Columbia Public Library (Washington, D.C.; project lead: Meaghan O’Connor; Twitter: @dcpl): Advancing the library as the primary anchor of an open information system that connects residents to essential health, human and social services.
Co-working at the Library by Miami Dade Public Library (Miami; project lead: Liz Pearson; Twitter: @MDPLS): Providing freelancers, entrepreneurs and innovators a collaborative space for co-working in Miami-Dade libraries.
Indie Games Licensing by Concordia University’s TAG Research Center (Montreal, project lead: Olivier Charbonneau; Twitter: @culturelibre): Prototyping models for the licensing and circulation of independent video games at libraries.
GITenberg by Project GITenberg (Montclair, N.J., and Somerville, Mass.; project leads: Eric Hellman and Seth Woodworth; Twitter: @GITenberg): Exploring collaborative cataloging for Project Gutenberg public-domain ebooks using the Web-based repository hosting service GitHub.
Journalism Digital News Archive by University of Missouri Libraries and the Donald W. Reynolds Journalism Institute (Columbia, Mo.; project lead: Edward McCain; Twitter: @e_mccain): Ensuring access to digital news content through development of a model for archiving and preserving digital content that can be used across the country.
Maker Tool Circulating Kits by Make it @ Your Library (Chicago; project leads: Katy Hite, Amy Killebrew, Elizabeth Ludemann, Allison Parker, Vicki Rakowski; Twitter: @MakeItLib): Sharing the tools and technology of the maker movement by prototyping an equipment lending system – a process for sharing maker kits between libraries – that builds on existing interlibrary loan frameworks.
Making the Invisible Visible by Bibliocommons (Boston; project lead: Iain Lowe Twitter: @bibliocommons, @ilowelife): Prototyping an app to give patrons a deeper library experience based on the user’s location, interests and actions in the library.
Privacy Literacy by San Jose Public Library (San Jose, Calif.; project leads: Erin Berman and Jon Worona; Twitter: @SanJoseLibrary): Developing online tools which will help individuals understand privacy in the digital age and make more informed decisions about their online activity.
Regional Business Information Bureau by Kent State University Library (Kent, Ohio; project lead: Karen MacDonald; Twitter: @KentState_LIB): Experimenting with models for a Business Information Bootcamp, connecting local entrepreneurs and small businesses to information and services that will support their growth and contributions to the local economy.
This Place Matters by Marshall University (Huntington, W.Va.; project lead: Monica Brooks; Twitter: @MUPlaceMatters): Exploring the potential of a location-aware mobile application to share African American history and link to library resources.
White Space 101 (San Francisco; project lead: Don Means; Twitter: @donmeans):
Creating learning materials for libraries to explore and implement TV White Space networks to support remote library Internet hotspots that will give people wider broadband access, especially in crisis situations.
Your Next Skill by Seattle Public Library (Seattle; project lead: Jennifer Yeung; Twitter: @splbuzz): Helping people acquire new skills or expand their knowledge by creating a librarian-led, referral service that connects users with materials, classes and instructors that will help them meet their goals.