Nieman Foundation at Harvard
HOME
          
LATEST STORY
Spain’s Eldiario.es has 18,000 paying members, and its eye on the next several million
ABOUT                    SUBSCRIBE
Feb. 25, 2013, 12:11 p.m.
Reporting & Production
kennedyhoover

Hiding in public: How the National Archives wants to open up its data to Americans

The agency, home to more than 500 terabytes of electronic files alone, faces some of the same problems that data journalists do.

kennedyhooverThe National Archives is sitting on massive amounts of information — from specs for NASA projects to geological surveys to letters from presidents. But there’s a problem: “These records are held hostage,” said Bill Mayer, executive for research services for the National Archives and Records Administration.

“Hostage” might be a strong word for a organization responsible for 4.5 million cubic feet of physical documents and more than 500 terabytes of data, most which can be accessed online or by walking into one of their facilities around the country. But the challenge, Mayer explains, is making NARA’s vast stockpile more open and more discoverable. “They’re held hostage in a number of centers around the country — they’re held hostage by format,” Mayer said.

Mayer and other officials from the National Archives visited MIT recently to talk about how the agency is trying to increase access to records and deal with the challenges, and legal complications, of electronic documents. The archive is responsible for records from executive branch agencies, courts, Congress, and presidents. It preserves only 5 percent of the federal government’s records, and there’s a 15-year lag before records are available. But an estimated 30,000 linear feet of new records come in from agencies annually.

A visual summary of the National Archives’ MIT presentation by Willow Brugh (CC).

In order to deal with all of that the archive has to be smarter, quicker, and more technologically savvy in the way it catalogs the nation’s paper trail. In a way, the biggest obstacle the archive faces is itself. “The issue at hand is setting free these records,” Mayer said. “At the heart of what the archive is about is promoting access.”

That’s one of the reasons the archives created an office of innovation last fall. After experimenting around the edges for several years, it was time to put more energy behind finding new ways to surface interesting material and involve the public in the record-keeping process, said Pamela Wright, the archive’s first chief innovation officer.

What started with a small project making archive photos available on Flickr has now expanded into more than 135 projects running on outside platforms, like the Today’s Document Tumblr. The archive works with companies like Ancestry.com, which helps digitize records in exchange for a brief window of exclusive access to the data. They also have a deep partnership with the Wikimedia Foundation. The National Archives has a Wikipedian in Residence who helps coordinate an open transcription project that lets the public transcribe physical documents online through a simple interface. Another project, the Citizen Archivist Dashboard, asks the public to help tag photos and other imagery, as well as contribute edits to a research wiki. It’s a focused approach to crowdsourcing, not unlike the open scientific surveys of the ocean floor or deep space.

The archive’s partnering and outreach is getting results, with an increase in visits to its website, more than 100,000 images in Wikimedia Commons, and almost 100,000 followers on Tumblr. But the goal of the National Archive’s strategy isn’t to chase social media metrics, Wright said: By working with partners and increasing their reach through social media, the archive is fulfilling its mission to make their collections available to the public. “It goes directly to the mission of our agency: You can get at participatory democracy in new ways,” she said. “You are helping your government provide access to the records of the people.”

As more federal records become available in electronic form, that creates a new set of complications for the archive. One, Mayer said, is that even through the archive can get records more quickly, the custody of those records remains with the home agency. So even if that fisheries database you made a FOIA request for is technically at the National Archives, it may still belong to the Department of the Interior for several more years.

Another challenge — one that will come as no surprise to data journalists — is dealing with messy or incomplete federal data. The archive has to work around proprietary or outdated file formats just as newsrooms do, Mayer said. “This is actually the scary monster in the room in terms of format obsolescence,” he said. “We can maintain access to things that are currently available. But in the future? Who knows?” One solution: Work with outsiders. “We’re looking now at how do we work with the developer community,” Wright said, “working with people who want to do things with electronic datasets we can make available now.”

Wright said they want to follow in the footsteps of agencies like NASA that have held hack days and other events for coders. Finding life for the data beyond spreadsheets and XML files would be another way to accomplish their mission of openness and access, Wright said.

Photo of John F. Kennedy, J. Edgar Hoover, and Robert Kennedy from the National Archives’ Flickr account.

POSTED     Feb. 25, 2013, 12:11 p.m.
SEE MORE ON Reporting & Production
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Spain’s Eldiario.es has 18,000 paying members, and its eye on the next several million
“We have a potential of six million readers. You may not convince all six million people to be your socios, but if you learn more about their interests, you can get closer.”
Chasing subscriptions over scale, The Athletic wants to turn local sports fandom into a sustainable business — starting in Chicago
“It’s very easy today to be click-driven and produce articles that don’t have a lot of substance or depth and don’t cost that much to produce, but that dynamic is disappointing for fans who want higher-quality content.”
Hot Pod: We now have new, free rankings to show how podcasts stack up against each other
Plus: Parsing the RadioPublic announcement; premium podcast subscriptions; Bill Simmons oversimplifies things.
What to read next
0
tweets
Hot Pod: As more podcasts become TV shows, can their founders retain creative control?
Plus: Podcasts as time-shifted cable TV; MTV News launches its first podcasts; Postloudness moves beyond Mailchimp.
0The Hindustan Times is working to build the definitive online source of real-time air quality in all of India
In addition to pulling in data from government stations for its map, the organization is deploying and testing its own air quality sensors across the country.
0A new growth area for foreign reporting: podcasts? With reporters in-country, GroundTruth hopes so
“There’s pretty much nothing, as far as I can tell, in terms of real, international, on-the-ground reporting in the world of podcasting.”
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Craigslist
MinnPost
Reddit
Chicago Tribune
Frontline
The Atlantic
Bureau of Investigative Journalism
Poynter Institute
Ushahidi
Wisconsin Center for Investigative Journalism
Conde Nast
Bloomberg Businessweek