Nieman Foundation at Harvard
HOME
          
LATEST STORY
The Wall Street Journal website — paywalled from the very beginning — turns 20 years old today
ABOUT                    SUBSCRIBE
July 19, 2012, 10:38 a.m.
Twitter preserved

That plan to archive every tweet in the Library of Congress? Definitely still happening

It has turned out to be quite an undertaking, but the Library plans to make good on its promise to America.

Twitter preserved

A little more than two years ago, the Library of Congress announced it would preserve every public tweet, ever, for future generations.

That’s right. Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.

Fifty million tweets a day. How cute. That number is now 400 million, according to Twitter CEO Dick Costolo. (The first comment on the project’s FAQ page sums up much of the Internet’s reaction: “It’s critical the future generations know what flavor burrito I had for lunch.”)

We hadn’t heard about this project in some time. Last week a story on Canada.com quoted a social-media researcher as saying the LoC “has quietly backed away from the commitment.”

False, said Library spokeswoman Jennifer Gavin; the project is very much still happening. Good librarianship, she said, moves more slowly than Twitter.

“The process of how to serve it out to researchers is still being worked out, but we’re getting a lot of closer,” Gavin told me. “I couldn’t give you a date specific of when we’ll be ready to make the announcement.”

The Library first revealed its plans in a tweet on April 14, 2010, but apparently that was before sorting out with Twitter the logistics of acquiring all that data. Petabytes of data.

“We began receiving the material, portions of it, last year. We got that system down. Now we’re getting it almost daily,” Gavin said. “And of course, as I think is obvious to anyone who follows Twitter, it has ended up being a very large amount of material.”

Gavin said the archive will be made available to anyone with a library card, but only on the premises in Washington. “My understanding is that at this time we do not intend to make it available by web,” she said, but that may be subject to change. It’s not meant to be the Ultimate Twitter Search Box we’ve always dreamed of.

In fact, there will be a six-month embargo on fresh tweets (even though, obviously, the data is publicly available — if you can find it). That agreement has been in place since the deal was struck. Twitter said then the tweets could be used only “for internal library use, for non-commercial research, public display by the library itself, and preservation.”

The challenge now is finding ways to refine the raw data in useful ways. Sort by keywords? Date? Sentiment? Burrito flavor? Gavin said the Library is still figuring out the user interface.

POSTED     July 19, 2012, 10:38 a.m.
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
The Wall Street Journal website — paywalled from the very beginning — turns 20 years old today
“From the very beginning it was very clear we needed to cover all the same concerns and sensibilities of the print Journal even though we were online and even though we were a young staff.”
Newsonomics: In the platform wars, how well are you armed?
“Think about platforms as fishing places where you can find large, engaged audiences and build a relationship with them by providing content. Then offer these users some other services off-platform.”
Wired’s making the long and slow switch to HTTPS and it wants to help other news sites do the same
With its HTTPS implementation, Wired’s starting with its security vertical and for users who pay for the ad-free version of the site.
What to read next
0
tweets
In the room where it happens: The host of NPR’s new show Embedded talks about news in podcast form
Kelly McEvers: “A lot of the great storytelling podcasts happen in the studio. I hope ours opens the door to people thinking more about what you can do in the field, when things don’t go as planned and are unexpected.”
0What a group of USC students learned shooting lots of VR video (hint: duct tape is involved)
The students traveled to Houston over spring break to shoot footage to accompany a ProPublica/Texas Tribune project on what a hurricane could do to the city.
0Audible, long known only for audiobooks, is branching out into podcasts — and news
The podcast/audio world has been waiting for Audible to make its big move into the space. It’s here, including original content from major publishers like The New York Times, The Wall Street Journal, and Jeff Bezos’ Washington Post.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Voice of San Diego
PBS NewsHour
GlobalPost
The Wall Street Journal
Outside.in
Wikipedia
Las Vegas Sun
The Miami Herald
Bloomberg
Lens
Foursquare
WyoFile