HOME
          
LATEST STORY
Opening up the archives: JSTOR wants to tie a library to the news
ABOUT                    SUBSCRIBE
Feb. 24, 2010, 10 a.m.

The Google/China hacking case: How many news outlets do the original reporting on a big story?

We often talk about the new news ecosystem — the network of traditional outlets, new startups, nonprofits, and individuals who are creating and filtering the news. But how is the work of reporting divvied up among the members of that ecosystem?

To try to build a datapoint on that question, I chose a single big story and read every single version listed on Google News to see who was doing the work. Out of the 121 distinct versions of last week’s story about tracing Google’s recent attackers to two schools in China, 13 (11 percent) included at least some original reporting. And just seven organizations (six percent) really got the full story independently.

But as usual, things are a little more subtle than that. I chose the Google-China story because it’s complex, international, sensitive, and important. It’s the sort of big story that requires substantial investigative effort, perhaps including inside sources and foreign-language reporting. Call it a stress test for our reporting infrastructure, a real-life worst case.

The New York Times broke the story last Thursday, writing that unnamed sources involved in the investigation of last year’s hacking of a number of American companies had traced the attacks to a prestigious technical university and a vocational college in mainland China. The article included comment from representatives of the schools and, while it had a San Francisco dateline, credited contributions from Shanghai staff. Immediately, the story was everywhere. Just about every major American newspaper and all the wires covered it.

When I started investigating the issue on Monday morning, Google News showed 800 different reports. But how many of these reports actually brought new information to light? By default, Google does not display duplicate copies of syndicated (or stolen) content, bringing the total down to more than 100 unique pieces of copy. I read each one, and several hours later, I had a spreadsheet recording the sourcing for each story. I also recorded the country of publication, the dateline or contributor location if noted, and the primary publishing medium of each outlet (paper, online, radio, etc.) An excerpt of this data is reproduced in the table below.

Here’s what I found:

Out of 121 unique stories, 13 (11 percent) contained some amount of original reporting. I counted a story as containing original reporting if it included at least an original quote. From there, things get fuzzy. Several reports, especially the more technical ones, also brought in information from obscure blogs. In some sense they didn’t publish anything new, but I can’t help feeling that these outlets were doing something worthwhile even so. Meanwhile, many newsrooms diligently called up the Chinese schools to hear exactly the same denial, which may not be adding much value.

Only seven stories (six percent) were primarily based on original reporting. These were produced by The New York Times, The Washington Post, the Wall Street Journal, The Guardian, Tech News World, Bloomberg, Xinhua (China), and the Global Times (China).

Of the 13 stories with original reporting, eight were produced by outlets that primarily publish on paper,  four were produced by wire services, and one was produced by a primarily online outlet. For this story, the news really does come from newspapers.

14 reports (12 percent) were produced by Chinese outlets, had a China dateline, or mentioned the assistance of staff in China. For a story about China, that seems awfully low to me. Perhaps this has to do with cutbacks of foreign correspondents?

Nine reports (7 percent) mentioned no source at all. Five more were partially unsourced. Given the ease of hyperlinks, this frightens me.

Google News tended to rank solid original stories fairly high in its list. Google says they rank stories based on criteria such as the reputation of a source, number of references by other articles, and the headline clickthrough rate — though they won’t reveal exactly how it’s done. The spreadsheet and table below list stories in the order that Google News ranked them.

Google’s story-clustering algorithm included three unrelated stories and missed at least one original report. The three extraneous stories were about Google and China, but not about the recent trace. The exclusion of the Financial Times’ excellent piece is a disappointment — perhaps this has something to do with their paywall? Maybe I’m biased because, as a computer scientist, I appreciate the difficulty of the problem — but I actually think this means that Google News works remarkably well, for a completely unsupervised algorithm that crawls billions of pages to find millions of stories in dozens of languages.

What were those other 100 reporters doing? When I think of how much human effort when into re-writing those hundred other unique stories that contained no original reporting, I cringe. That’s a huge amount of journalistic effort that could have gone into reporting other deserving stories. Why are we doing this? What are the legal, technical, economic and cultural barriers to simply linking to the best version of each story and moving on?

The punchline is that no English-language outlet picked up the original reporting of Chinese-language Qilu Evening News, which was even helpfully translated by Hong Kong blogger Roland Soong. A Chinese reporter visited one of the schools in question and advanced the story by clarifying that serious hackers were unlikely to have been trained in the vocational computer classes offered there. Soong told me that Lanxiang Vocational School is well known in China for their cheesy late-night commercials and low-quality schooling — more of an educational chop shop for cooks and mechanics than the training ground for military hackers than the Times claims.

Tracing one story doesn’t prove anything conclusive beyond that one story, of course. And using Google News as a filter doesn’t truly represent the new news ecosystem: It excludes lots of smaller blogs and other outlets. Soong said Google News told him that his site is not eligible for inclusion in their results because they don’t include small blogs written by a single author. This seems like an arbitrary distinction, but it’s hard to imagine what defensible choice Google could make in an era where the definition of a news source is so up for grabs.

The table below is an extract from the data I collected, with original reporting highlighted. The full spreadsheet also includes country of publication, primary medium for each organization, and lists whether or not each story hyperlinked to its sources.

Article Sources Dateline
Calgary Herald Xinhua, NYT (via AFP)
ABC AP, Xinhua Shanghai
Xinhua original Shanghai
MarketWatch NYT, Xinhua San Francisco
Reuters Xinhua, NYT Shanghai
OneIndia China Daily, NYT (via ANI) Bejing
Economic Times ? Washington
PC Magazine Blogs NYT
Washington Post original, NYT Bejing
Times Online NYT Washington
Information Week NYT, original
FOX News NYT (via AP)
The Canadian Press NYT (via AP)
Taipei Times (via NYT) San Francisco
The Register NYT, Guardian UK, blog
The Inquirer AP
MarketWatch NYT San Francisco
ComputerWorld NYT, blog
Telegraph UK NYT
PC World NYT, Xinhua
Telegraph UK NYT Los Angeles
Wall Street Journal original, Xinhua, NYT
The Guardian NYT, original
Business Week (Bloomberg) Washington
AFP NYT New York
Reuters NYT New York
New York Times original San Francisco, Shanghai
Daily Contributor PC World
CCTV China Daily, NYT, original
Australia Network News Xinhua, NYT
After Dawn ?, NYT
Top News NYT
Daily Latest News ?
Press Trust of India China Daily, NYT Bejing
UPI NYT New York
Security Pro News ?
Gizmodo NYT
Tom’s Guide NYT
Digital Media Wire NYT Mountain View
Tech News World original, NYT
Global Times original, “agencies”
io9 NYT, Guardian
ZD Net NYT
Benzinga NYT
Fox Business NYT
CrunchGear NYT
AOL News NYT, Guardian, WSJ
Tech Blorge NYT
KLIV NYT Silicon Valley
eWeek NYT
TMCnet NYT
News.am NYT
Chattabox NYT
Datamation NYT
The New New Internet NYT
IT Pro Portal Business Week, Telegraph, PC World
The Hill NYT
Grab Geek Points NYT
DBTechno NYT Boston
IT Chuiko NYT
All Things Digital NYT
Before It’s News NYT
V3 ?
San Jose Business Journal NYT
Help Net Security NYT
Channel Web NYT
Marketing Pilgrim NYT
The Money Times NYT
TG Daily NYT, Guardian
ABH News NYT, ?
Top News NYT, ?
PCR NYT
Top News NYT
Daily Finance NYT, Hacker Journals
Shuttervoice ?
Thinq NYT
Top News NYT
New York Magazine NYT
Venture Beat NYT
Fast Company NYT
Gather News NYT
Newser NYT
NASDAQ NYT (via Dow Jones Newswire)
Reuters Xinhua Shanghai
PC World NYT, Xinhua
Herald Sun NYT, Xnhua (via AFP) Bejing
The Hindu ?
The Times of India ?
Daily Mail NYT
PC World NYT, blogs
ComputerWorld NYT (via IDG)
News.com.au NYT
The Globe and Mail NYT, original (via Reuters)
9News NYT
Redmond Pie NYT,?
Red Orbit NYT
New Public NYT
Sydney Morning Herald NYT (via AP)
Gulf Times NYT
MyNews Xinhua, NYT (via Indo Asian News)
Zeenews (India) NYT, Xinhua (via PTI)
The Tech Herald NYT, Guardian Bejing
Web Pro News Financial Tines, NYT
Business Insider NYT
The Financial Express original, NYT (via Bloomberg)
Tech Eye NYT, ?
CIO NYT, WSJ (via IDG)
Tech Blorge NYT, Xinhua
CNET NYT, Xinhua
ZD Net NYT, Washington Post
China Daily NYT, original
Bejing News ?
What’s on Xiamen NYT, Xinhua
NPR NYT
San Francisco Chronicle NYT, Xinhua (via AP) Shanghai
The Cap Times NYT, AP, Computer World
Little About NYT, Xinhua (via Indo Asian News) Jinan
Little About NYT, original (via Asian News Intl) Bejing
San Francisco Chronicle NYT (via AP) San Francisco
Portfolio.com NYT
World Market Media ?
POSTED     Feb. 24, 2010, 10 a.m.
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Opening up the archives: JSTOR wants to tie a library to the news
Its new site JSTOR Daily highlights interesting research and offers background and context on current events.
Six fresh ideas for news design from a #SNDMakes designathon
New media and legacy media came together at the second weekend-long “hackathon” hosted by the Society for News Design.
Where you get your news depends on where you stand on the issues
A new study by the Pew Research Center examines how Americans’ news consumption habits correlate with where they fall on the political spectrum.
What to read next
1020
tweets
The newsonomics of the millennial moment
The new wave of news startups is aiming at a younger audience. But do legacy media companies have a chance at earning their attention?
803A mixed bag on apps: What The New York Times learned with NYT Opinion and NYT Now
The two apps were part of the paper’s plan to increase digital subscribers through smaller, targeted offerings. Now, with staff cutbacks on the way, one app is being shuttered and the other is being adjusted.
537Watching what happens: The New York Times is making a front-page bet on real-time aggregation
A new homepage feature called “Watching” offers readers a feed of headlines, tweets, and multimedia from around the web.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Frontline
Bloomberg
Lens
Arizona Guardian
Baristanet
New England Center for Investigative Reporting
El Faro
Neighborlogs
El País
Wikipedia
Corporation for Public Broadcasting
Suck.com