Twitter  "Algorithms have consequences." Zeynep Tufekci on Ferguson and net neutrality:  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

Linking by the numbers: How news organizations are using links (or not)

In my last post, I reported on the stated linking policies of a number of large news organizations. But nothing speaks like numbers, so I also trawled through the stories on the front pages of a dozen online news outlets, counting links, looking at where they went, and how they were used.

I checked 262 stories in all, and to a certain degree, I found what you’d expect: Online-only publications were typically more likely to make good use of links in their stories. But I also found that use of links often varies wildly within the same publication, and that many organizations link mostly to their own topic pages, which are often of less obvious value.

My survey included several major international news organizations, some online-only outlets, and some more blog-like sites. Given the ongoing discussion about the value of external links, and the evident popularity of topic pages, I sorted links into “internal”, “external”, and “topic page” categories. I included only inline links, excluding “related articles” sections and sidebars.

Twelve hand-picked news outlets hardly make up an unbiased sample of the entire world of online news, nor can data from one day be counted as comprehensive. But call it a data point — or a beginning. For the truly curious, the spreadsheet contains article-level numbers and notes.

Of the dozen online news outlets surveyed, the median number of links per article was 2.6. Here’s the average number of links per article for each outlet:

Source Internal External Topic Page Total
BBC News 0 0 0 0
CNN 0.3 0.2 0.7 1.2
Politico 0.7 0.2 0.6 1.5 0.1 0.2 1.4 1.7
Huffington Post 1.1 1.0 0 2.1
The Guardian 0.5 0.2 1.8 2.4 0.9 1.9 0 2.8
Washington Post 1.0 0.3 2.0 3.3
Christian Science Monitor 2.5 1.1 0 3.6
TechCrunch 1.8 3.6 1.2 6.6
The New York Times 1 1.2 4.6 6.8
Nieman Journalism Lab 1.4 13.1 0 14.5

The median number of internal links per article was 0.95, the median number of external links was 0.65, and the median number of topic page links was also 0.65. I had expected that online-only publications would have more links, but that’s not really what we see here. TechCrunch and our own Lab articles rank quite high, but so does The New York Times. Conversely, the BBC, Reuters, CNN, and The Huffington Post are not converting from a print mindset, so I would have expected them to be more web native — but they rank at the bottom.

What’s going on here? In short, we’re seeing lots of automatically generated links to topic pages. Many organizations are using topic pages as their primary linking strategy. The majority of links from The New York Times, The Washington Post,, CNN, and Politico — and for some of these outlets the vast majority — were to branded topic pages.

Topic pages can be a really good idea, providing much needed context and background material for readers. But as Steve Yelvington has noted, topic pages aren’t worth much if they’re not taken seriously. He singles out “misplaced trust in automation” as a pitfall. Like many topic pages, this CNN page is nothing more than a pile of links to related stories.

It doesn’t seem very useful to use such a high percentage of a story’s links directing readers to such pages. I wonder about the value of heavy linking to broad topic pages in general. How much is the New York Times reader really served by having a link to the HBO topic page from every story about the cable industry, or the Washington Post reader served by links on mentions of the “GOP”?

I suspect that links to topic pages are flourishing because such links can be generated by automated tools and because topic pages can be an SEO strategy, not because topic page links add great journalistic value. My suspicion is that most of the topic page links we are seeing here are automatically or semi-automatically inserted. Nothing wrong with automation — but with present technology it’s not as relevant as hand-coded links.

So what do we see when we exclude topic page links?

Excluding links to topic pages — counting only definitely hand-written links — the median number of links per article drops to 1.7. The implication here is that something like 30 percent of the links that one finds in online news articles across the web go to topic pages, which certainly matches my reading experience. Sorting the outlets by internal-plus-external links also shows an interesting shift in the linking leaderboard.

Source Internal External Total
BBC News 0 0 0 0.1 0.2 0.3
CNN 0.3 0.2 0.5
The Guardian 0.5 0.2 0.7
Politico 0.7 0.2 0.9
Washington Post 1.0 0.3 1.3
Huffington Post 1.1 1.0 2.1
The New York Times 1 1.2 2.2 0.9 1.9 2.8
Christian Science Monitor 2.5 1.1 3.6
TechCrunch 1.8 3.6 5.4
Nieman Journalism Lab 1.4 13.1 14.5


The Times and the Post have moved down, and online-only outlets and the Christian Science Monitor have moved up. TechCrunch still ranks high with a lot of linking any way you slice it, and the Lab is still the linkiest because we’re weird like that. (To prevent cheating, I didn’t tell anyone at the Lab, or elsewhere, that I was doing this survey.) But the BBC, CNN, and Reuters are still at the bottom.

Linking is unevenly executed, even within the same publication. The number of links per article depended on who was writing it, the topic, the section of the publication, and probably also the phase of the moon. Even obviously linkable material, such as an obscure politician’s name or a reference to comments on Sarah Palin’s Facebook page, was inconsistently linked. Meanwhile, one anomalous Reuters story linked to the iPad topic page on every single reference to “iPad” — 16 times in one story. (I’m going to have to side with the Wikipedia linking style guide here, which says link on first reference only.)

Whether or not an article contains good links seems to depend largely on the whim of the reporter at most publications. This suggests a paucity of top-down guidance on linking, which is in line with the rather boilerplate answers I got to my questions about linking policy.

Articles posted to the “blog” section of a publication generally made heavier use of links, especially external links. The average number of external links per page at The New York Times drops from 1.2 to 0.8 if the single blog post in the sample is excluded — it had ten external links! Whatever news outlets mean by the word “blog,” they are evidently producing their “blogs” differently, because the blogs have more links.

The wire services don’t link. Stories on — as distinguished from stories delivered on Reuters professional products — had an average of 1.7 links per article. But only 0.3 of these links were not to topic pages, and only blog posts had any external links at all. Stories read on Reuters professional products sometimes contain links to source financial documents or other Reuters stories, though it’s not clear to me whether these systems use or support ordinary URLs. The Associated Press has no hub news website of its own so I couldn’t include it in my survey, but stories pushed to customers through their standard feed do not include inline links, though they sometimes include links in an an “On the Net” section at the end of the story.

As I wrote previously, Reuters and AP told me that the reason they don’t include inline hyperlinks is that many of their customers publish on paper only and use content management systems that don’t support HTML.

What does this all mean? The link has yet to make it into the mainstream of journalistic routine. Not all stories need links, of course, but my survey showed lots of examples where links would have provided valuable backstory, context, or transparency. Several large organizations are diligent about linking to their own topic pages, probably with the assistance of automation, but are wildly inconsistent about linking to anything else. The cultural divide between “journalists” and “bloggers” is evident by the way that writers use links (or don’t use them), even within the same newsroom. The major wire services don’t yet offer integrated hypertext products for their online customers. And when automatically generated links are excluded, online-only publications tend to take links more seriously.

What to read next
BuzzFeed HQ
Caroline O'Donovan    Aug. 11, 2014
With a new round of investment from Andreessen Horowitz, the viral giant aims to get even bigger. One step down that path: making it more clear to readers what kinds of content BuzzFeed really offers.
  • Lyn Headley

    I think it’s fascinating that a single blog page at the NY times is the real driver behind the Times seemingly heavy linking — and I wonder how many readers caught this. I also wonder what explains this disparity. Is it because the editorial oversight given to bloggers is more lax? Does the New York times have a different standard for bloggers than it does for its other reporters and content producers? Does this extend into other areas like language use and choice of stories to cover?

  • Paul Peters

    What does it all mean? I think the answer lies in the value that links possess, as Google uses them to determine who get ranked in the SERPS. No one wants to hand that value off to another website. And, if at all possible, they want to maximize it for themselves, thus the linking back to their own topic pages, and the massive amount of iPad links in the Reuters story, rather than linking out to someone else who may have more relevant info.

  • Pingback: Link Strategy |

  • Kevin Heisler


    Enjoyed your research and ongoing campaign to encourage online media sites to increase the number and quality of outbound links. Fascinating to see how your journalist’s methodology complements academic research by Juliette De Maeyer (via Twitter, @juliettedm):

    Great catch by Lynn Headley @laheadle on link gap between NYT online print edition and NYT blogs in WordPress. NYT blogs link generously; NYT Op Ed tends to link liberally. It seems fairly clear that NYT links frequently to its Topics database, just as TechCrunch links to its CrunchBase db. The NYT link policy also seems fairly clear: for proper nouns: links to official websites (prominent public figures); to their Facebook page when relevant; to websites of network and cable TV shows etc. More interesting would be the linking practices of specific journalists, bloggers and op-ed writers over time.

    In terms of comparing the news sites, you ended up with significantly different news categories. How did you choose the stories and sections?

    Finally, what’s more annoying than the distraction of in-paragraph links? The animated NYT flying-in link box/editorial advertising banner. Nick Carr must love that one.

  • Jonathan Stray

    I chose stories by going to the home page of each site and reading every story listed on that page, excluding small print sidebars or long lists below a divider. This gave an average of ~20 stories for each organization, hopefully stories that they themselves considered important.

    Even so, it’s clear that this sample isn’t large enough to be truly definitive. The Times only put one “blog” story on its front page that day, which did drive up the average, but you couldn’t call it statistically significant. However, I feel confident saying that the blog sections of the Times and many other news organizations are in general much better linked, just from my general browsing experience.

    I really don’t know why this is, or what is supposed to differentiate a “blog” from “real news.” Is there a difference in production process? Can I know what this difference is?

    As far as adding links for SEO, this certainly seems to be part of the current linking strategies of many organizations. The topic page SEO video I reference in article says as much explicitly. But that video also discusses how external links are necessary to make the strategy work. Simple “juice goes out” metaphors are a very long way from how Google actually works. It wouldn’t surprise me if pages on sites with few outbound links were actually penalized by Google’s current algorithm — for the good reason that such sites are often less valuable to searchers.

    Finally, I discovered De Maeyer’s work yesterday and I am very much a fan. Her automated methodology is a great complement to mine, and her visualizations really illuminate the structure of linking between different news orgs and blogs. I find it significant that she comes to essentially the same conclusion as me:

    “It is established that links with a journalistic purpose are scarce as compared to links with a commercial or practical aim.”

    If you enjoyed my little linking survey, then you will probably enjoy reading her paper — and such pretty pictures!

    – Jonathan

  • Juliette De Maeyer

    Fascinating data and analysis!

    I was a bit surprised by BBC News weak score, as it seems to me that they have an active linking policy. But then I read your article again more carefully and noticed you had excluded links in sidebars – where all the BBC “related internet links” are situated. During your exploration, did you notice other news sites that have chosen to display links in sidebars rather that in-paragraph?

    That’s a question I face almost everyday (and which can become a real nightmare when using automated crawling tools): which links must be taken into account?

    BTW I started to explore BBC News links a few months ago (but then I had to focus on French news sites so couldn’t analyze BBC links in depth): what struck me at first sight was the important amount of links to governmental websites. For example, if I randomly click on today’s headlines, there’s an article about a Finnish court sentencing a Rwandan preacher to life. And the “related internet links” point to the Finnish and Rwandan government’s sites. I believe it really questions the “journalistic (added) value” of links…

  • Pingback: This Week in Review: A mobile aggregation dustup, journalists and the link, and fan-based local sports » Nieman Journalism Lab

  • Pingback: Paywalling Blogs is Good News, Sort Of « The Written Word and Other Fantastic Creatures

  • Steve Herrmann

    Hi Jonathan, I’m the Editor of the BBC News site and I got a bit of a shock when I saw we came up with ZERO links in your quick sample.

    But I don’t think it’s as it seems – as Juliette points out above, our external links are displayed on story pages in the right hand column alongside the story text, not within it. That’ll change soon, but it’s the way our page templates work right now.

    So, in fact (I haven’t checked them all) but a quick look at three or four of the BBC stories in your spreadsheet shows we linked – for example – to the US Geological Survey, Nasa’s Space Shuttle site (along with five other news organisations’ reports on the shuttle launch) and Xinhua news agency. Maybe you saw them and deliberately didn’t count them because of where they are located, but right now that’s where most of our external links have to live on story pages.

    Anyway, I was relieved to see the stories I looked at were not bereft of links, because as you know we are working to develop and improve our external linking so zero links would have been, well, kind of disappointing to say the least …

    Aside from that point, I think the argument you are making is timely, interesting and broadly right.

    Steve Herrmann

  • Jonathan Stray


    First of all, thanks for taking a look at this.

    As you note, the BBC looks pretty anti-link in this survey, and I know from your previous writing that the BBC is certainly at the forefront of linking thinking among news organizations.

    The problem comes down to how you count links. I chose to count only inline links — links written directly into the text — for several reasons:

    – they are much less likely to be auto-generated than “see also” panels, and thus represent a higher standard of journalistic relevance.

    – I believe that the position and text of an inline link adds context that is crucial to hypertext storytelling.

    – it wouldn’t make sense to count every single link on the page, including navigation and advertising. I had to choose some sort of definition of what constituted “part of the story.”

    Juliette, you touch on this point as well. I read your paper and I could not find a precise description of how you decided to include or exclude links on a page. I imagine you excluded advertising. What about “related stories?” And how did you decide if a link added “journalistic value”? E.g. You mention links to train timetables, but wouldn’t these be valuable links if they appeared within a story about trains?

    The question of “what is a valuable link” is a tricky one; it’s one of the reasons I decided to treat topic page links with so much scrutiny. I freely admit that the categories I used for counting are arguable.

    – Jonathan

  • Pingback: Links and journalism: what is at stake? « Juliette D. M.

  • Pingback: Dénouer l’écheveau des liens – Media Trend

  • Jared Stein

    I wasn’t surprised to learn why you excluded the BBC’s sidebar external links because they are not inline–inline linking is fun and popular, but not necessarily better for comprehension or attention.

  • Pingback: Jonathan Stray » The editorial search engine

  • Pingback: Jonathan Stray » The newspaper form serves no one

  • Pingback: online degree