HOME
          
LATEST STORY
Controlled chaos: As journalism and documentary film converge in digital, what lessons can they share?
ABOUT                    SUBSCRIBE
Feb. 25, 2013, 12:11 p.m.
Reporting & Production
kennedyhoover

Hiding in public: How the National Archives wants to open up its data to Americans

The agency, home to more than 500 terabytes of electronic files alone, faces some of the same problems that data journalists do.

kennedyhooverThe National Archives is sitting on massive amounts of information — from specs for NASA projects to geological surveys to letters from presidents. But there’s a problem: “These records are held hostage,” said Bill Mayer, executive for research services for the National Archives and Records Administration.

“Hostage” might be a strong word for a organization responsible for 4.5 million cubic feet of physical documents and more than 500 terabytes of data, most which can be accessed online or by walking into one of their facilities around the country. But the challenge, Mayer explains, is making NARA’s vast stockpile more open and more discoverable. “They’re held hostage in a number of centers around the country — they’re held hostage by format,” Mayer said.

Mayer and other officials from the National Archives visited MIT recently to talk about how the agency is trying to increase access to records and deal with the challenges, and legal complications, of electronic documents. The archive is responsible for records from executive branch agencies, courts, Congress, and presidents. It preserves only 5 percent of the federal government’s records, and there’s a 15-year lag before records are available. But an estimated 30,000 linear feet of new records come in from agencies annually.

A visual summary of the National Archives’ MIT presentation by Willow Brugh (CC).

In order to deal with all of that the archive has to be smarter, quicker, and more technologically savvy in the way it catalogs the nation’s paper trail. In a way, the biggest obstacle the archive faces is itself. “The issue at hand is setting free these records,” Mayer said. “At the heart of what the archive is about is promoting access.”

That’s one of the reasons the archives created an office of innovation last fall. After experimenting around the edges for several years, it was time to put more energy behind finding new ways to surface interesting material and involve the public in the record-keeping process, said Pamela Wright, the archive’s first chief innovation officer.

What started with a small project making archive photos available on Flickr has now expanded into more than 135 projects running on outside platforms, like the Today’s Document Tumblr. The archive works with companies like Ancestry.com, which helps digitize records in exchange for a brief window of exclusive access to the data. They also have a deep partnership with the Wikimedia Foundation. The National Archives has a Wikipedian in Residence who helps coordinate an open transcription project that lets the public transcribe physical documents online through a simple interface. Another project, the Citizen Archivist Dashboard, asks the public to help tag photos and other imagery, as well as contribute edits to a research wiki. It’s a focused approach to crowdsourcing, not unlike the open scientific surveys of the ocean floor or deep space.

The archive’s partnering and outreach is getting results, with an increase in visits to its website, more than 100,000 images in Wikimedia Commons, and almost 100,000 followers on Tumblr. But the goal of the National Archive’s strategy isn’t to chase social media metrics, Wright said: By working with partners and increasing their reach through social media, the archive is fulfilling its mission to make their collections available to the public. “It goes directly to the mission of our agency: You can get at participatory democracy in new ways,” she said. “You are helping your government provide access to the records of the people.”

As more federal records become available in electronic form, that creates a new set of complications for the archive. One, Mayer said, is that even through the archive can get records more quickly, the custody of those records remains with the home agency. So even if that fisheries database you made a FOIA request for is technically at the National Archives, it may still belong to the Department of the Interior for several more years.

Another challenge — one that will come as no surprise to data journalists — is dealing with messy or incomplete federal data. The archive has to work around proprietary or outdated file formats just as newsrooms do, Mayer said. “This is actually the scary monster in the room in terms of format obsolescence,” he said. “We can maintain access to things that are currently available. But in the future? Who knows?” One solution: Work with outsiders. “We’re looking now at how do we work with the developer community,” Wright said, “working with people who want to do things with electronic datasets we can make available now.”

Wright said they want to follow in the footsteps of agencies like NASA that have held hack days and other events for coders. Finding life for the data beyond spreadsheets and XML files would be another way to accomplish their mission of openness and access, Wright said.

Photo of John F. Kennedy, J. Edgar Hoover, and Robert Kennedy from the National Archives’ Flickr account.

POSTED     Feb. 25, 2013, 12:11 p.m.
SEE MORE ON Reporting & Production
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Controlled chaos: As journalism and documentary film converge in digital, what lessons can they share?
Old and new media types from journalism, documentary, and technology backgrounds gathered at MIT to share practices and discuss mutual concerns.
The near future of First Look’s next site, Racket, looks fuzzy
The site, promised as a “satirical approach to American politics and culture,” was set to launch this month, but now it’s unclear when or if it’ll get off the ground.
The newsonomics of the Sun-Times national/local network play
The company behind Chicago’s No. 2 newspaper wants to go national on the cheap. Can it succeed where Patch and others have failed?
What to read next
1020
tweets
The newsonomics of the millennial moment
The new wave of news startups is aiming at a younger audience. But do legacy media companies have a chance at earning their attention?
803A mixed bag on apps: What The New York Times learned with NYT Opinion and NYT Now
The two apps were part of the paper’s plan to increase digital subscribers through smaller, targeted offerings. Now, with staff cutbacks on the way, one app is being shuttered and the other is being adjusted.
413The new Vox daily email, explained
The company’s newsletter, Vox Sentences, enters an increasingly crowded inbox. Can concise writing and smart aggregation on the day’s news help expand their audience?
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Backfence
GateHouse Media
Wikipedia
Twitter
The Daily Voice
The Nation
Foursquare
Patch
The Awl
The Weekly Standard
Gannett
Current TV