Twitter  Facebook and Storyful today launched FB Newswire, a realtime feed of user-generated content nie.mn/1mDqO3V  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

The Google/China hacking case: How many news outlets do the original reporting on a big story?

We often talk about the new news ecosystem — the network of traditional outlets, new startups, nonprofits, and individuals who are creating and filtering the news. But how is the work of reporting divvied up among the members of that ecosystem?

To try to build a datapoint on that question, I chose a single big story and read every single version listed on Google News to see who was doing the work. Out of the 121 distinct versions of last week’s story about tracing Google’s recent attackers to two schools in China, 13 (11 percent) included at least some original reporting. And just seven organizations (six percent) really got the full story independently.

But as usual, things are a little more subtle than that. I chose the Google-China story because it’s complex, international, sensitive, and important. It’s the sort of big story that requires substantial investigative effort, perhaps including inside sources and foreign-language reporting. Call it a stress test for our reporting infrastructure, a real-life worst case.

The New York Times broke the story last Thursday, writing that unnamed sources involved in the investigation of last year’s hacking of a number of American companies had traced the attacks to a prestigious technical university and a vocational college in mainland China. The article included comment from representatives of the schools and, while it had a San Francisco dateline, credited contributions from Shanghai staff. Immediately, the story was everywhere. Just about every major American newspaper and all the wires covered it.

When I started investigating the issue on Monday morning, Google News showed 800 different reports. But how many of these reports actually brought new information to light? By default, Google does not display duplicate copies of syndicated (or stolen) content, bringing the total down to more than 100 unique pieces of copy. I read each one, and several hours later, I had a spreadsheet recording the sourcing for each story. I also recorded the country of publication, the dateline or contributor location if noted, and the primary publishing medium of each outlet (paper, online, radio, etc.) An excerpt of this data is reproduced in the table below.

Here’s what I found:

Out of 121 unique stories, 13 (11 percent) contained some amount of original reporting. I counted a story as containing original reporting if it included at least an original quote. From there, things get fuzzy. Several reports, especially the more technical ones, also brought in information from obscure blogs. In some sense they didn’t publish anything new, but I can’t help feeling that these outlets were doing something worthwhile even so. Meanwhile, many newsrooms diligently called up the Chinese schools to hear exactly the same denial, which may not be adding much value.

Only seven stories (six percent) were primarily based on original reporting. These were produced by The New York Times, The Washington Post, the Wall Street Journal, The Guardian, Tech News World, Bloomberg, Xinhua (China), and the Global Times (China).

Of the 13 stories with original reporting, eight were produced by outlets that primarily publish on paper,  four were produced by wire services, and one was produced by a primarily online outlet. For this story, the news really does come from newspapers.

14 reports (12 percent) were produced by Chinese outlets, had a China dateline, or mentioned the assistance of staff in China. For a story about China, that seems awfully low to me. Perhaps this has to do with cutbacks of foreign correspondents?

Nine reports (7 percent) mentioned no source at all. Five more were partially unsourced. Given the ease of hyperlinks, this frightens me.

Google News tended to rank solid original stories fairly high in its list. Google says they rank stories based on criteria such as the reputation of a source, number of references by other articles, and the headline clickthrough rate — though they won’t reveal exactly how it’s done. The spreadsheet and table below list stories in the order that Google News ranked them.

Google’s story-clustering algorithm included three unrelated stories and missed at least one original report. The three extraneous stories were about Google and China, but not about the recent trace. The exclusion of the Financial Times’ excellent piece is a disappointment — perhaps this has something to do with their paywall? Maybe I’m biased because, as a computer scientist, I appreciate the difficulty of the problem — but I actually think this means that Google News works remarkably well, for a completely unsupervised algorithm that crawls billions of pages to find millions of stories in dozens of languages.

What were those other 100 reporters doing? When I think of how much human effort when into re-writing those hundred other unique stories that contained no original reporting, I cringe. That’s a huge amount of journalistic effort that could have gone into reporting other deserving stories. Why are we doing this? What are the legal, technical, economic and cultural barriers to simply linking to the best version of each story and moving on?

The punchline is that no English-language outlet picked up the original reporting of Chinese-language Qilu Evening News, which was even helpfully translated by Hong Kong blogger Roland Soong. A Chinese reporter visited one of the schools in question and advanced the story by clarifying that serious hackers were unlikely to have been trained in the vocational computer classes offered there. Soong told me that Lanxiang Vocational School is well known in China for their cheesy late-night commercials and low-quality schooling — more of an educational chop shop for cooks and mechanics than the training ground for military hackers than the Times claims.

Tracing one story doesn’t prove anything conclusive beyond that one story, of course. And using Google News as a filter doesn’t truly represent the new news ecosystem: It excludes lots of smaller blogs and other outlets. Soong said Google News told him that his site is not eligible for inclusion in their results because they don’t include small blogs written by a single author. This seems like an arbitrary distinction, but it’s hard to imagine what defensible choice Google could make in an era where the definition of a news source is so up for grabs.

The table below is an extract from the data I collected, with original reporting highlighted. The full spreadsheet also includes country of publication, primary medium for each organization, and lists whether or not each story hyperlinked to its sources.

Article Sources Dateline
Calgary Herald Xinhua, NYT (via AFP)
ABC AP, Xinhua Shanghai
Xinhua original Shanghai
MarketWatch NYT, Xinhua San Francisco
Reuters Xinhua, NYT Shanghai
OneIndia China Daily, NYT (via ANI) Bejing
Economic Times ? Washington
PC Magazine Blogs NYT
Washington Post original, NYT Bejing
Times Online NYT Washington
Information Week NYT, original
FOX News NYT (via AP)
The Canadian Press NYT (via AP)
Taipei Times (via NYT) San Francisco
The Register NYT, Guardian UK, blog
The Inquirer AP
MarketWatch NYT San Francisco
ComputerWorld NYT, blog
Telegraph UK NYT
PC World NYT, Xinhua
Telegraph UK NYT Los Angeles
Wall Street Journal original, Xinhua, NYT
The Guardian NYT, original
Business Week (Bloomberg) Washington
AFP NYT New York
Reuters NYT New York
New York Times original San Francisco, Shanghai
Daily Contributor PC World
CCTV China Daily, NYT, original
Australia Network News Xinhua, NYT
After Dawn ?, NYT
Top News NYT
Daily Latest News ?
Press Trust of India China Daily, NYT Bejing
UPI NYT New York
Security Pro News ?
Gizmodo NYT
Tom’s Guide NYT
Digital Media Wire NYT Mountain View
Tech News World original, NYT
Global Times original, “agencies”
io9 NYT, Guardian
ZD Net NYT
Benzinga NYT
Fox Business NYT
CrunchGear NYT
AOL News NYT, Guardian, WSJ
Tech Blorge NYT
KLIV NYT Silicon Valley
eWeek NYT
TMCnet NYT
News.am NYT
Chattabox NYT
Datamation NYT
The New New Internet NYT
IT Pro Portal Business Week, Telegraph, PC World
The Hill NYT
Grab Geek Points NYT
DBTechno NYT Boston
IT Chuiko NYT
All Things Digital NYT
Before It’s News NYT
V3 ?
San Jose Business Journal NYT
Help Net Security NYT
Channel Web NYT
Marketing Pilgrim NYT
The Money Times NYT
TG Daily NYT, Guardian
ABH News NYT, ?
Top News NYT, ?
PCR NYT
Top News NYT
Daily Finance NYT, Hacker Journals
Shuttervoice ?
Thinq NYT
Top News NYT
New York Magazine NYT
Venture Beat NYT
Fast Company NYT
Gather News NYT
Newser NYT
NASDAQ NYT (via Dow Jones Newswire)
Reuters Xinhua Shanghai
PC World NYT, Xinhua
Herald Sun NYT, Xnhua (via AFP) Bejing
The Hindu ?
The Times of India ?
Daily Mail NYT
PC World NYT, blogs
ComputerWorld NYT (via IDG)
News.com.au NYT
The Globe and Mail NYT, original (via Reuters)
9News NYT
Redmond Pie NYT,?
Red Orbit NYT
New Public NYT
Sydney Morning Herald NYT (via AP)
Gulf Times NYT
MyNews Xinhua, NYT (via Indo Asian News)
Zeenews (India) NYT, Xinhua (via PTI)
The Tech Herald NYT, Guardian Bejing
Web Pro News Financial Tines, NYT
Business Insider NYT
The Financial Express original, NYT (via Bloomberg)
Tech Eye NYT, ?
CIO NYT, WSJ (via IDG)
Tech Blorge NYT, Xinhua
CNET NYT, Xinhua
ZD Net NYT, Washington Post
China Daily NYT, original
Bejing News ?
What’s on Xiamen NYT, Xinhua
NPR NYT
San Francisco Chronicle NYT, Xinhua (via AP) Shanghai
The Cap Times NYT, AP, Computer World
Little About NYT, Xinhua (via Indo Asian News) Jinan
Little About NYT, original (via Asian News Intl) Bejing
San Francisco Chronicle NYT (via AP) San Francisco
Portfolio.com NYT
World Market Media ?
                                   
What to read next
Day 17 Egypt Revolution
Joseph Lichterman    April 24, 2014
It’s part of Facebook’s continued push to make itself a place where journalists go to find content to share or feed into their stories.
  • http://flavors.me/howard Howard Weaver

    Nice work on the analysis.

    Although you seem reluctant to say so, one additional conclusion seems incontrovertible: almost all the genuine journalism here was done by traditional organizations.

  • http://www.lot49.com Thomas Claburn

    A related and perhaps more interesting question: If Google News rewarded original reporting by placement and time of story exposure, would that generate enough extra traffic that news organizations could afford to hire the reporters necessary to do that kind of reporting?

    Given the power of Google News, referring often a third to a half of the non-direct traffic at a news site, the Google News algorithm plays a major role in news site revenue. And the coverage usually follows the money.

  • The Dude

    Howard, that’s not necessarily true, since Google News tends to exclude non-traditional sources to begin with. Otherwise ESWN would show up all the time on these China-related stories, doing original research and reporting. Besides, it’s ridiculously common for newspapers to go out of their way to avoid telling you that they got their info from a blog post or a YouTube video. Or indeed another paper, as demonstrated here. They always give the impression that their plucky foreign correspondent did all the work, even when it’s just a regurgitated wire story.

  • ChasL

    I would also like to add that when content isn’t original, strict fact check isn’t enforced.

    Every wonder how did Lanxiang, a 3rd rate voc tech, got implicated? IMHO it’s due to some bad stranslation of the School’s PR that mistook “culinary technicians” as “technology officers”.

    While some graduates from Lanxiang’s culinary school do enlist in the military, they are far from the “computer scientist” NYT is claiming.

  • Pingback: Principles of Journalism and News Media » Blog Archive » Links to Google topic from lecture

  • http://caracina.wordpress.com caracina

    Really good post. Just imagine what would happen if, in addition to your research, we started to collect the news written in languages other than English…

  • Pingback: Links for 25/02/10 | Daniel Bentley

  • Pingback: “Burbling blips” & “pyramiding”: What does the Google-China story tell us about how news spreads? » Nieman Journalism Lab

  • Pingback: This is the news, as seen everywhere else « A media journo writes…

  • http://worldcolouredglasses.blogspot.com Ann Danylkiw

    I’m looking for a media sponsor so I can go to China to write about China on a journalist visa — I specialize in economics, finance, and climate change. But have I had any luck with this? nope.

    These numbers are shameful. You can’t adequately write about a country you’re not physically in. I don’t care how many culture or history books, magazines or newspapers you read!

  • http://www.varpartners.net Douglas

    You’ve done a service to journalism by pointing this out. I think what is very interesting about the situation though is that it could be that original reporting could happen after the original story is broken open.

    Think of it this way: the original story comes out in the New York Times. It’s old hat that now every other news service will work like an aggregator and just make sure they “cover” the covering of the news. That’s what blogging has done to influence the way news outlets function.

    But the real reporting might be other stories that cracked open after reading the original piece. I say this with a caveat. I’ve written to several other news organizations to get them to pay attention to what has come up on Roland Soong’s web site: the real story is about the corruption and function of some of these universities.

    See here: http://www.zonaeuropa.com/201002c.brief.htm#005

    It is very rare that you read any story, based in China or otherwise, that talks about the schools as hotbeds of corruption and less than appealing methodology. But where are the reporters to cover this? That’s a whole new field of opportunity.

    There was one school in China that actually addressed this issue: Shantou University.

    Here’s context: http://www.zonaeuropa.com/20060109_1.htm

  • Pingback: Future of news – various thoughts | beyond the times

  • http://byjoeybaker.com Joey Baker

    I think the spin everyone seems to be putting on these facts is all wrong. I’m horribly impressed that there are 13 journalists working on one story!

    I do count curation as journalism, and I’d be curious to know what “obscure blogs” were being curated, because those blogs are being treated as sources. If that’s the case, then the sources are going direct, and they to are committing the act of journalism. Which means there’s journalists adding to this story that haven’t been counted.

  • http://jonathanstray.com Jonathan Stray

    Joey-

    There are definitely journalists who haven’t been counted, as I touch upon in my discussion of Roland Soong’s blog. It’s also wrong to define “news” as what’s listed on Google News, but I had to start somewhere.

    For a good perspective on these results — including this issue of curation — I recommend C.W. Anderson’s follow up piece: http://www.niemanlab.org/2010/02/burbling-blips-pyramiding-what-does-the-google-china-story-tell-us-about-how-news-spreads/

  • Pingback: This Week in Review: The Times’ blogs behind the wall, paid news on the iPad, and a new local news co-op » Nieman Journalism Lab

  • http://twitter.com/owenfletcher Owen Fletcher (IDG)

    Hello Jonathan,

    I had the following thoughts on various points you made in this post. I’m a reporter for IDG, so I wrote the story for PCWorld. (I’m actually in Beijing and that was my dateline. PCWorld just doesn’t list it.)

    So, I suggest that looking at this single slice of the bigger Google-in-China story may lead us to underestimate how many reporters are doing related original reporting. Yes, for this specific story the percentage was low. But I think you’d agree that’s largely because sources with the information about the hacking investigation were tough to get. If you were to take on the much more massive task of reading all the Google-in-China stories since they started on Jan 12, you’d find many more of us had done various stories that used original reporting to take forward the analysis or the factual context of the broader Google issue – even if we were lazy (or making an economical decision) for this particular story.

    You ask what the other 100 reporters were doing. Well, consider this issue with the Chinese schools a sub-story in the broader set of Google-in-China stories. Finding a source for the actual interesting part of this particular sub-story – the progress in the hacking investigation, not the schools’ denial – was very difficult. So I, presumably like some other reporters, was spending my time trying to get the “reportedly” story out ASAP so I could start working on some other story where I’d get a higher return on my invested time. (I.e., a story for which I would have to work less but would also have some fresh angle on the broader Google issue; or a story on some other news topic altogether, for which sources were more readily available to me or to the public.) Our news organizations all have limited resources and we want to spend as little time as possible duplicating other people’s work. It’s clear economics.

    As to other thoughts: You touch on this, but “original reporting” in the form of new quotes doesn’t equate to value. You can get your own quotes without providing any new information or analysis. You’ve listed the Tech News World report, for instance, as based mainly on original reporting. But the first original citation it has (from McAfee) appears to be in graph 10 or graph 12. And in the following graphs, the original quotes from its three sources (McAfee, Kaspersky and Google) are mainly them declining to comment. That adds little new information – and, incidentally, a few graphs of that seems to me like a turn-off for readers. So a better economic decision for the reporter might have been just to leave some or all of those quotes out.

    Also, you mention that no one cited the Qilu Evening News story. I find that unsurprising. I appreciate that Roland Soong translated it; it’s good for the source at least to be available in English. But I doubt most foreign reporters have heard of that paper. And many of us try to cite Chinese media only when we know the source has a good reputation (in principle, at least). This is good practice particularly because of problems with traditional reporter ethics in parts of Chinese journalism. There is much good Chinese journalism, but for instance, reporters here are often issued a sum like US$30 each for attending any company’s press conference; they widely seem to use direct quotes when in fact they’re paraphrasing a source; they go on reporting trips paid by the companies they report on, though I suppose that’s increasingly common in the West too; and in general the goal of unbiased journalism, or at least the standards for it, seem looser in China than in the US. In fact, I at least strive to cite Chinese media only if it is state-run, like Xinhua or China Daily. That at least gives their words a degree of government backing, though no guarantee of accurate or unbiased information. I’m not saying Qilu was biased somehow in this specific case, but in general I don’t like citing relatively unknown Chinese media.

    It also seems unsurprising that most of the original reporting was done by traditional news outlets, rather than blogs (which I think is the distinction being drawn). Reporters are paid to do this work – at least I’d guess we’re paid more than most people making money by blogging – and when a story requires substantial investigative effort you’re not going to find many people doing it in their free time.

    Last, I’m also disturbed by the nine reports that mentioned no source at all. Were the authors just trying to hide their lack of original reporting?

    Yours,
    Owen Fletcher

  • Kelvin

    “14 reports (12 percent) were produced by Chinese outlets, had a China dateline, or mentioned the assistance of staff in China.”

    I feel like this particular story may have this metric due to being about a sensitive topic in China – where press is strictly regulated.

  • ChasL

    Joey, can you really count bloggers as journalists? The biggest mistakes can be attributed to quoting blogs without fact checking.

    For example, Washington Post’s Ellen Nakashima’s “military spy central” descripton of Lanxiang Vocational were direct quotes from China Digital Times blog:

    http://chinadigitaltimes.net/2010/02/two-chinese-schools-said-to-be-tied-to-online-attacks/

    Where Lanxiang’s blog praising 17 students earning “technical sergeant” (equivlant of our specialist 2nd class) enlistment was mistranslated as “technology officers”, as proof of Lanxiang’s military connection on technology and hacking.

  • Pingback: links for 2010-02-26 « Sameer Padania

  • Pingback: How Many Innovations that Are Truly Useful Can You Name? | Startup Websites

  • Pingback: how many news outlets do the original reporting on a big story? » Nieman Journalism Lab — award tour

  • http://jonathanstray.com Jonathan Stray

    Owen,

    I agree that sources with information on the investigation were hard to get — that’s actually part of why I chose this story. It was intentionally a very hard case. I’d like to repeat the analysis with something a simpler story, but it’s not clear to me how to choose. Anyone have ideas?

    I actually don’t have strong feelings on the whole MSM vs. blogs thing. Personally, I think the lines are increasingly blurry. I was more interested in how many journalists are doing original reporting.

    But, if “we want to spend as little time as possible duplicating other people’s work,” then why are these rewrites happening at all? More and more I think the news industry is sorely in need of a syndication system that works online, an Associated Press for the 21st century.

    (The reasons why AP itself isn’t up to that task are complex and deep. For one thing, they don’t move HTML stories.)

    You are also absolutely right in that Chinese reporting is more complex than it first seems. I’m in Bejing myself this week, let’s do get a Tsing Tao and talk further.

    – Jonathan

  • http://blog.forsino.com Yves

    Nice work!

    The boss of the notorious Lanxiang school must pay for these 121 correspondants:

    See this topic:
    http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&u=http%3A%2F%2Fwww.cnbeta.com%2Farticles%2F104748.htm&sl=auto&tl=en

  • Pingback: Medial Digital» Linktipps Neu » Linktipps zum Wochenstart (49)

  • http://twitter.com/owenfletcher Owen Fletcher (IDG)

    Hi Jonathan,

    Thanks for your reply. And for doing this post in the first place, which I didn’t mention last time.

    I at least am still doing the rewrites since they don’t take long and they still get a lot of page views. Makes sense again economically.

    Some new syndication service sounds useful. How to do that while also keeping a variety of news organizations, we can discuss later.

    Owen

  • Pingback: Journalistlaget vil ha lisens for nettaviser « Plausible Fremtider : den som lever får se

  • Pingback: …My heart’s in Accra » Jonathan Stray on original reporting: imaginary abundance

  • Pingback: This Week in Review: The Times’ blogs behind the wall, paid news on the iPad, and a new local news co-op | Mark Coddington

  • Pingback: Curation, news in Manga and original research | The Evolving Newsroom

  • Pingback: Le vere notizie? Arrivano da pochi e selezionati giornali « EJO – European Journalism Observatory

  • http://www.google.com/profiles/sharunsanthosh Sharun

    If original reporting can be ranked on the uniqueness (and possibly “quality”) of the quotes, as seems to be the basis of this analysis, I wonder how long it’s going to be, before we get “orignal content” stats/rating/score beside the story on Google News (or some other aggregator). An automated process should be able to reproduce the first four things you highlight, and it would be very useful for users to see it.

    It’s quite possible they already do this behind the scenes. Even if the stats dont show up and they push traffic towards orignal content, its a good things for everyone including the news orgs.

    But having said that, all the pleonastic content on the internet does have a purpose, it allows the content to flow far and wide, through mindless replication. Thanks to which I have landed here. Thanks too for the great post!

  • Pingback: The Google/China hacking case: How did the story flow through Chinese-language media? » Nieman Journalism Lab

  • Pingback: Il n’y a que 11% d’informations originales sur Google | Pour ceux qui aiment le Net

  • Pingback: 9 Out Of 10 Mainstream News Stories Are Copied » Podcasting News

  • http://www.taazza.com Arjun Ram

    Most of the Indian sources that you point out seem to syndicate from IANS & PTI, which are news agencies. But your larger point is well taken.

    The problem is that there isnt a service that tracks what journalists report on. Some sort of a reporterRank. This would go a long way of running leaner, meaner organizations that crave a niche for themselves and are profitable!

  • Pingback: Kopieren wird belohnt « Kulturkampf

  • Pingback: News on Google is 11% original. Possibly | Pj News| Latest Daily News About World News, Business, Tech and Entertainment

  • Pingback: Cuando las webs solo replican notas de otros | Clases de Periodismo

  • Pingback: Phill Dolby » Blog Archive » BLOG: Only 11 % of news found in Google had original content

  • Pingback: Phill Dolby » Blog Archive » BLOG: Only 11 % of news found in Google had original content

  • http://www.globalmatterspost.com Alisa Miller
  • Pingback: Linheraptor vs the international media « Dave Hone’s Archosaur Musings

  • Pingback: Cacofonía en la Red: poca información original en Google News « Periodismo Global: la otra mirada

  • Pingback: Il futuro dei giornali è già scritto su internet « EJO – European Journalism Observatory

  • Pingback: Verso un sistema ibrido dell’ informazione, quale equilibrio economico? | LSDI

  • http://www.mojomag.de Clemens Gleich

    Great work, well compiled for giving it to those ignorant of the inner workings of the news biz.

    I have been working in this business for a long time and all your findings match my experiences. If someone has a story, we must have it too – on our site. But we don’t have much time, so let’s just translate it, rewrite it and add a quote from the press release, if we can find it.

    The big problem with this is that an error gets repeated around the world until its false content is as good as true. And the old “check at least two sources” doesn’t work in a time where 90 percent of these so-called sources have just copied the stuff, too.

  • Pingback: Bezahlcontent 2 : jens weinreich

  • Pingback: Google CEO Trashes Bloggers » Podcasting News

  • Pingback: Internet Strategy for News Organisations » Blog Archive » Course Syllabus

  • Pingback: Internet Strategy for News Organisations » Blog Archive » Session 2: Digital Storytelling