HOME
          
LATEST STORY
Complicating the network: The year in social media research
ABOUT                    SUBSCRIBE
Feb. 25, 2013, 12:11 p.m.
Reporting & Production
kennedyhoover

Hiding in public: How the National Archives wants to open up its data to Americans

The agency, home to more than 500 terabytes of electronic files alone, faces some of the same problems that data journalists do.

kennedyhooverThe National Archives is sitting on massive amounts of information — from specs for NASA projects to geological surveys to letters from presidents. But there’s a problem: “These records are held hostage,” said Bill Mayer, executive for research services for the National Archives and Records Administration.

“Hostage” might be a strong word for a organization responsible for 4.5 million cubic feet of physical documents and more than 500 terabytes of data, most which can be accessed online or by walking into one of their facilities around the country. But the challenge, Mayer explains, is making NARA’s vast stockpile more open and more discoverable. “They’re held hostage in a number of centers around the country — they’re held hostage by format,” Mayer said.

Mayer and other officials from the National Archives visited MIT recently to talk about how the agency is trying to increase access to records and deal with the challenges, and legal complications, of electronic documents. The archive is responsible for records from executive branch agencies, courts, Congress, and presidents. It preserves only 5 percent of the federal government’s records, and there’s a 15-year lag before records are available. But an estimated 30,000 linear feet of new records come in from agencies annually.

A visual summary of the National Archives’ MIT presentation by Willow Brugh (CC).

In order to deal with all of that the archive has to be smarter, quicker, and more technologically savvy in the way it catalogs the nation’s paper trail. In a way, the biggest obstacle the archive faces is itself. “The issue at hand is setting free these records,” Mayer said. “At the heart of what the archive is about is promoting access.”

That’s one of the reasons the archives created an office of innovation last fall. After experimenting around the edges for several years, it was time to put more energy behind finding new ways to surface interesting material and involve the public in the record-keeping process, said Pamela Wright, the archive’s first chief innovation officer.

What started with a small project making archive photos available on Flickr has now expanded into more than 135 projects running on outside platforms, like the Today’s Document Tumblr. The archive works with companies like Ancestry.com, which helps digitize records in exchange for a brief window of exclusive access to the data. They also have a deep partnership with the Wikimedia Foundation. The National Archives has a Wikipedian in Residence who helps coordinate an open transcription project that lets the public transcribe physical documents online through a simple interface. Another project, the Citizen Archivist Dashboard, asks the public to help tag photos and other imagery, as well as contribute edits to a research wiki. It’s a focused approach to crowdsourcing, not unlike the open scientific surveys of the ocean floor or deep space.

The archive’s partnering and outreach is getting results, with an increase in visits to its website, more than 100,000 images in Wikimedia Commons, and almost 100,000 followers on Tumblr. But the goal of the National Archive’s strategy isn’t to chase social media metrics, Wright said: By working with partners and increasing their reach through social media, the archive is fulfilling its mission to make their collections available to the public. “It goes directly to the mission of our agency: You can get at participatory democracy in new ways,” she said. “You are helping your government provide access to the records of the people.”

As more federal records become available in electronic form, that creates a new set of complications for the archive. One, Mayer said, is that even through the archive can get records more quickly, the custody of those records remains with the home agency. So even if that fisheries database you made a FOIA request for is technically at the National Archives, it may still belong to the Department of the Interior for several more years.

Another challenge — one that will come as no surprise to data journalists — is dealing with messy or incomplete federal data. The archive has to work around proprietary or outdated file formats just as newsrooms do, Mayer said. “This is actually the scary monster in the room in terms of format obsolescence,” he said. “We can maintain access to things that are currently available. But in the future? Who knows?” One solution: Work with outsiders. “We’re looking now at how do we work with the developer community,” Wright said, “working with people who want to do things with electronic datasets we can make available now.”

Wright said they want to follow in the footsteps of agencies like NASA that have held hack days and other events for coders. Finding life for the data beyond spreadsheets and XML files would be another way to accomplish their mission of openness and access, Wright said.

Photo of John F. Kennedy, J. Edgar Hoover, and Robert Kennedy from the National Archives’ Flickr account.

POSTED     Feb. 25, 2013, 12:11 p.m.
SEE MORE ON Reporting & Production
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Complicating the network: The year in social media research
Journalist’s Resource sifts through the academic journals so you don’t have to. Here are 12 of the studies about social and digital media they found most interesting in 2014.
News in a remix-focused culture
“We have to stop thinking about how to leverage whatever hot social platform is making headlines and instead spend time understanding how communication is changing.”
Los Angeles is the content future
“Creative content people are frustrated with the industry and creating their content on their own terms. Sound familiar?”
What to read next
500
tweets
Complicating the network: The year in social media research
Journalist’s Resource sifts through the academic journals so you don’t have to. Here are 12 of the studies about social and digital media they found most interesting in 2014.
339Finance media’s hottest club is Ello
Business reporters flocking to the platform won’t radically change journalism, but it’s worth asking why users gather where they do.
305Why Google is taking another shot at helping readers pay for news
Google Contributor is the latest tool the company has designed to help readers pay for what they read online. But its previous experiments in supporting paid content have had limited success.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Ann Arbor News
The Fiscal Times
Austin American-Statesman
GateHouse Media
Upworthy
OpenFile
Futurity
Wired
New West
The Huffington Post
Voice Media Group
National Journal