Twitter  How are local public media stations doing on innovation? A new survey has some answers nie.mn/1ipr7OP  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

How The Associated Press will try to rival Wikipedia in search results

Yesterday we revealed plans by The Associated Press to hold back some content from member websites. (Great discussion going on there, by the way.) The primary motivation of that initiative is search: AP material that resides on hundreds of disparate sites at the same time will hardly rate in Google compared to a single page with hundreds of links pointing to it. That’s a fundamental tenet of search engine optimization.

The same philosophy is driving their plan to build “news guide landing pages” that will aggregate the AP’s content around subjects, places, organizations, and people. Think of the topic pages on sites like The Chicago Tribune, BBC, and others — except that the AP will be harnessing its vast network of members and customers in what could amount to a brilliant SEO play.

The landing pages were first mentioned at the AP’s annual meeting in April, but further details haven’t emerged until now. In material distributed to some members last month, the news guide is described as “a central location to which headlines, promotional products and other content developed by AP could point.” What that will mean in practice is similar to what you find in the digital content of other news organizations: All references in AP articles to, say, Bill Clinton would link to the landing page with aggregated content and other material about the former president.

But, of course, those links to the landing pages would come from member news sites with excellent PageRank, the key metric used by Google to determine search results. (For instance, CNN, which carries AP content, has the maximum and extremely rare PageRank of 10.) It’s easy to see how the AP’s landing pages could, in short order, shoot up near the top of results for popular, news-related search terms.

Competing with Wikipedia

The document that I referenced yesterday, the one labeled “AP CONFIDENTIAL — NOT FOR DISTRIBUTION,” includes four pages of sharp, if widely accepted, analysis of how news is consumed today. Referring to coverage of Michael Jackson’s death, it says:

Two of the biggest beneficiaries of that traffic bonanza were Twitter and Wikipedia, a couple of digital natives that would have been viewed as very unlikely news competitors even a few months ago. Indeed, a new pattern of consumption was validated in the confusing minutes that followed the first reports of Jacko’s death: Users shared; they searched and they clicked on Wikipedia….

The Wikipedia page on Michael Jackson is not very pretty to look at, but it has more blue hyperlinks than black type. Forget the “wiki” method of community updating, the key to Wikipedia’s success is that its pages are designed to catch traffic, provide key information and then send users on their way to deeper engagement on the subjects they’re interested in.

(According to Wikipedia traffic statistics, the site’s Michael Jackson page has been viewed more than 24 million times since his death.)

There’s further discussion of Wikipedia’s dominance in search results, which is a product of all the external links pointing to Wikipedia and a variety of other factors. As Mathieu O’Neil Sage Ross, the editor of in-house newsletter Wikipedia Signpost, told me yesterday, “Google juice goes in, swishes around, doesn’t come out.” And that’s clearly what the AP would like to emulate, although it’s less clear how they’ll generate many links beyond member and customer sites. The document states flatly, “The Wikipedia model of standing, authoritative pages could be challenged.”

Proof of concept

Most of the AP’s landing pages would be automatically generated, although “editorial curation” would also be possible. That’s the model followed by sites like The New York Times, which has had decent success with Times Topics. In an internal memo late last year, Times editors boasted, “Many months of SEO labor…helped promote our Credit Crisis page to the prominence it deserves; search for “credit crisis” on Google and our Topic Page comes up first.” (It should be noted, though, that the Times page has been passed since that memo; it currently ranks behind The Crisis of Credit, a terrific 11-minute video describing credit concepts made by a young man named Jonathan Jarvis as part of his master’s thesis at a design college. Take from that what you will.)

The AP is also hoping it can convince members to join the project and have their content aggregated on the landing pages as well. (Of course, plenty of websites, citing fair use, do that already without any formal partnership.) The material distributed last month notes that the landing pages could “facilitate paid distribution of AP and member content,” although I don’t get the sense that’s a priority. As with the strategy I described yesterday, there’s a real question of balance here: It’s obvious what the AP gains, but members will obviously want to know what’s in it for them.

An SEO firm called EveryZing recently produced a trial run of the AP’s landing pages, according to their vice president for client services and business development, Bob Fogarty. EveryZing has also created topic pages for Fox News and Newsweek. In the latter case, the project is actually called…Newsweekopedia.

Much of this strategy follows what Google vice president Marissa Mayer suggested in recent testimony to Congress. It’s also in line with the research of Matt Thompson, whose currently online community manager at the Knight Foundation. And I wrote about these ideas when Google News began including Wikipedia in its search results.

                                   
What to read next
INNlogo_blue
Justin Ellis    April 15, 2014
Chalkbeat, Southern California Public Radio, InvestigateWest and others are awarded over $236,000 in micro-grants to support events programming, collaborative reporting, and a “native underwriting” pilot program.
  • http://www.ibsys.com Andy Kruse

    If the AP is working WITH its subscribers to boost the search-engine standing of these new hosted landing pages, it’ll have to offer clean links back to its subscribers’ content in the case of relevant, original reporting.

    Wikipedia is able to retain all of its “Google juice” because it employs the nofollow attribute on all outbound links. All the juice flows one way, because people want to link to Wikipedia.

    If AP mandates links in, it should offer links out as well.

  • http://www.mltda.com Dave Levy

    Starting at zero, with a new “AP News” site that carries a wall (that is not hosted as “sourced” on Google News), is the challenge. The property has to be worth optimizing and, well, exist, right? You don’t SEO a piece of content, you SEO a site.

    Could it work? Maybe. But if news wires come with requirements to link, there will be a non-prof that figures a way. Or, alternatively, if we want to think really far outside the box, maybe that’s what Wikipedia could already be considered.

    —–

    @Andy Kruse – excellent point. Links can’t work just one way. Getting a link from Wikipedia is an important part of why people trust it and are willing to link back. Let’s see the AP point outward, too.

  • http://ragesossscholar.blogspot.com Sage Ross

    Andy Kruse: It’s worth noting that although Wikipedia now uses nofollow (since 2007), its rise to search engine dominance happened before that. Nofollow was actually implemented precisely because high PageRank had made it such a tempting target for linkspam. Wikipedia has such dense internal linking (and article titles as anchor text) that it even without nofollow it still gets more than enough Google juice.

    Within the Wikipedia community, we actually think of the Google juice trapping aspect of nofollow as a negative; we’d rather the curated external links we include in articles get that extra boost, to improve the quality of search results. It was just way too tempting for spammers and SEOs to abuse the system and insert irrelevant or marginal links just to gain PageRank.

    So, for example, Wikipedia does use follow links for “interwiki” links: links to other non-abusive wiki projects. In the future, it’s possible that Wikipedia will implement a broader external link whitelist to try to return to contributing to the link economy without getting spammed unreasonably.

  • Pingback: If AP Can’t Beat the Google Spammers, It Will Join Them [Print Is Dead] | Geek & High Tech

  • Fakt Chkr

    CNN.com doesn’t use AP content.

  • HaeB

    @Sage Ross: Is there actual evidence for this effect (switching on nofollow for outgoing links raises a site’s PageRanks)? I haven’t been able to find a reference, only some remarks to the contrary.

    It should be noted that nofollow was already introduced for all Wikipedia language versions in January 2005. Only the English Wikipedia switched it off soon afterwards, and reintroduced it in January 2007. So if this “Google juice hoarding” effect really exists, it should have been possible to observe the English Wikipedia falling behind the other language Wikipedias during these two years (interwiki links notwithstanding).

  • http://javaunmoradi.com/blog Javaun Moradi

    Zach, thanks for continuing to cover this as it unfolds. I’m a search guy — not a journalist — so I’m not familiar with the terms between AP and members. Clearly this has ramifications beyond online strategy and will have big implications for AP’s existing revenue mix.

    While AP has a lot unique of content, I think they will find it far more difficult to dominate topical news searches than they realize.

    I’m skeptical on the future SEO value of topical archives. Times Topics was first-mover and still has the richest one out there, but these archives are becoming almost a commodity. As you mention, Newsweek and Fox have them, and NPR launched one, powered by Daylife: topics.npr.org. I think these archives offer a great browsing experience and ultimately lead to a lot of content discovery and value for the hosting site. But now that everyone has one, I think their search value can only go down. Google may devalue these pages algorithmically as they’ve done in the past to every other automated directory page outside of news, or users may simply avoid clicking them in the results (which will also eventually cause those rankings to drop). What you won’t see is a Google results page with 10 automated topics for a general query — and even if it happened, who’s to say that any one player can stay dominant?

    Wikipedia pages have all of the structural elements to do well in search, but above all they are living, up-to-date documents listing all of the pertinent facts about a given subject on a *single page*: where was Michael Jackson born, how many albums did he sell, where did he go to school, etc. Each of those questions is a long-tail search query that an automated topic page consisting of article blurbs probably can’t answer. For many web searches, Wikipedia not only answers the initial question but is also the perfect jumping off point to learn more about a subject. It is hard to create this kind of value without constant human editing, and their crowd-sourced model gives them a big advantage here.

    As Sage stated, Wikipedia “nofollows” outbound links — meaning it strips SEO authority from them — because these pages were easy targets for spammers. The current wisdom among SEO’s is that quality outbound links actually help a website’s search engine credibility — not that a site like Wikipedia needs any more authority.

    Of course, the 10 search results you see in Google on page 1 are different than the 10 I see, and this is largely determined by the sites we’ve visited in the past (Google calls this “personalized search”.)

    And that’s a good segue: personalize search reflects repeat visits (loyalty) and name recognition (brand identity).

    Google CEO Eric Schmidt has said a lot about brand loyalty in recent months and the value and credibility big brands bring to search. Algorithm changes in early 2009 appeared to have give big brands a boost in search.

    Ending on that note, it would be surpremely ironic if the secret to AP’s success turned out to be positioning its brand and value proposition correctly to its intended audience. Whoever that turns out to be…

    Javaun Moradi
    Product Manager, Search
    NPR Digital Media

  • http://javaunmoradi.com/blog Javaun Moradi

    Sorry, typo.

    Ending on that note, it would be surpremely ironic if the secret to AP’s success in SEARCH TRAFFIC turned out to be positioning its brand and value proposition correctly to its intended audience. Whoever that turns out to be…

  • Pingback: Blog :: WebEditorMike

  • http://ragesossscholar.blogspot.com Sage Ross

    Thanks, Javaun.

    HaeB, apparently I was wrong in assuming that limiting the followable outgoing links leaves more Google juice for internal links.

    In its most basic form, PageRank and similar algorithms don’t differentiate between internal and external links, since the individual webpage is the unit of analysis. But obviously Google uses much more complicated and fine-tuned methods than that, and I don’t know much about the state-of-the-art in divining what all goes into them.

  • Pingback: AP to Dominate Google Rankings? - Programming Blog

  • Pingback: How The Associated Press will try to rival Wikipedia in search results | Library Stuff

  • Pingback: If AP Can’t Beat The Google Spammers, It Will Join Them | Defamer Australia

  • Pingback: Top Positions - This Week In Search For 8/13/09 « TopPositions.org

  • Pingback: How The Associated Press Will Try to Rival Wikipedia in Search Results [Voices] | UpOff.com

  • Pingback: Again, Why Shouldn’t I be Out of My Mind for 14 August 2009 « Out Of My Mind

  • Pingback: AP Almost Gets Something Right… But Then Gets It Wrong | PHP Hosts

  • Pingback: This Week in Search for 8/13/09 | The Best Seo Blogs

  • Pingback: Google’s Varian: Search scale is ‘bogus’ | SataByte.com

  • http://notnews.today.com/ David Gerard

    It’s also worth noting that Wikipedia does nothing whatsoever to search-engine-optimise … we try to do a good site and it’s highly-ranked because so many people use it.

    (Around 2005 we had a situation where Wikipedia mirrors would be ranked way higher than Wikipedia itself in Google, such that you’d get three pages of mirror sites before the actual source text! Some people asked Google what was up with that and it doesn’t happen any more. But we have no idea what happens internally at Google.)

    Most of the “link juice” complaints are from search-engine optimisers. Wikipedia cares about its readers, and has little to no interest in helping a third party (SEOs) get in good with a fourth party (Google). It’s nice to give good links follow benefits, but the linkspammers were a goddamn pain.

  • Pingback: You Get The . Info » AP Almost Gets Something Right… But Then Gets It Wrong – 1181th Edition

  • Pingback: How The Associated Press Will Try to Rival Wikipedia in Search Results | Zachary M. Seward | Voices | AllThingsD

  • http://lloydbudd.com/ Lloyd Budd

    Seems like this development could really effect Mahalo.com

  • Pingback: Here’s the AP document we’ve been writing about » Nieman Journalism Lab

  • Pingback: UNFIT for the Current Backwoods Major Media Paradigm | UNFIT

  • Pingback: Media Channel 2.0 — Blog — As I Was Saying...

  • Pingback: AP to Dominate Google Rankings? « Internet Marketing KB

  • Zachary M. Seward

    Just wanted to say I’ve bee enjoying this discussion and learning a ton. Thanks, everyone. I’ll keep these comments bookmarked for future posts on SEO for news sites. —Zach

  • Pingback: This Week in Search for 8/13/09 | Hot Hosting

  • Pingback: How The Associated Press will try to rival Wikipedia in search results » Nieman Journalism Lab ¦ Online Media Cultist

  • Pingback: This Week in Search for 8/13/09 | Typezilla – Make Money Online, Search Engine Optimization

  • Pingback: AP Craves Wikipedia Allure : Beyond Search

  • Pingback: This Week in Search for 8/13/09 | Byte Right Domains

  • Pingback: What Do You Think of the New AP SEO Strategy?

  • CarolN

    What is more troubling for the AP in comparison to Wikipedia is their attack last year on bloggers who quote more than 5 words from an AP story without paying.

    I haven’t heard much lately on updated policies from them about quoting and linking, but I know there are bloggers out there who are still avoiding even linking to AP stories out of principle.

    Unless their interpretation of fair use has been updated to be similar to the licensing of Wikipedia as Creative Commons Attribution ShareAlike, I don’t see their SEO efforts getting them the traffic that they want.

  • Pingback: Turbine-electric hybrid VTOL attack drone flies again | SataByte.com

  • Pingback: David Gerard » Blog Archive » Associated Press: web news strategy as SEO comedy.

  • Pingback: This Week in Search for 8/13/09 | Home Business | Drop Shipping | Wholesale | Network Marketing

  • http://www.tribuneinteractive.com/network Brent D. Payne

    Fascinating concept. One that won’t have the success that the AP is hoping however.

    I run the SEO for Tribune (btw, thanks for the link to the Barack Obama topic page). That means I run it for about 70 domains now including latimes.com, chicagotribune.com, baltimoresun.com, orlandosentinel.com, sun-sentinel.com and a couple dozen TV station websites like ktla.com, wpix.com, wgntv.com, etc. In addition to that we have several vertical sites like zap2it.com.

    So . . . my point here is that I deal with a large network of really powerful sites in regards to PageRank, link strength, domain age, authority, content credibility, etc.. Unfortunately, I also deal with a significant duplicate content problem (it’s a Content Management System problem, I’m trying to fix and I will–in the short term). If you take a look at the following search results you will see where a story published by the Chicago Tribune has been copied to 2,740 different URLs (yes, two thousand seven hundred forty) different URLs.

    Look for yourself…
    http://www.google.com/search?hl=en&safe=off&pws=0&q=inurl:chi-michelle-obama-dress-story&start=10&sa=N&filter=0

    Now, if you are an SEO stop laughing (or crying) and if you aren’t continue to listen for a bit longer as I prove why the AP strategy won’t work as well as they expect.

    If you look at any of the 2,740 dupes (there you go laughing again) you will note that there are several ‘inline’ links to other Topic pages in that story. But . . . none of them (at least none if you put the &pws=0 in the URL, which wipes out ‘Personalized Web Search’) show up on the first page of Google. They do okay, 2nd page . . .

    Hmmm. Why is that? Well, here’s why. It’s duplicate content. Google knows this . . . how do we know they know this? Well, when you hit the URL below you will find that there are only two results.

    http://www.google.com/search?hl=en&safe=off&q=inurl%3Achi-michelle-obama-dress-story&pws=0

    What happened to the other 2,700 or so? Oh, they removed them from the results because they were duplicates.

    As much as I wish that all my duplicate content pages (which, I agree it’s horrible to have duplicate content pages–I said I’m fixing it) would pass PageRank back to the other links in the content like the non-dupes do, the truth of the matter is they don’t.

    They are, what used to be called, ‘supplemental results’. Google got rid of the supplemental index a few years ago (darn us SEOs making their life more difficult) but these results are in there. Why? Well because no user wants to see the same content 2,740 times.

    So, Google will simply see that there are 10,000 of copies of the same content out there and that it is all linking to the same locations and . . . discredit (or significantly minimize) the ‘quality’ and ‘authority’ of those links.

    Here’s the funniest part though. Since they have now started charging bloggers by number of words quoted for an AP story instead of letting people freely link to their stories or use their stories in blurbs (like normal people do, wink), blurbs where it is part of a page that IS unique content, they have managed to cut off the best stream of links to their sites and their content. Talk about cutting off their foot at the neck . . . bad strategy. You WANT links to you AP and you WANT people to blurb your content. Sure, not steal the whole story but use a blurb of it is a really good thing. Look up ‘citation rank’ sometime or call Josh/Abe at Google News or look up what Google just said at SES San Jose in their Google News presentation. Again, not a wise move.

    Now let’s point out one more thing. You’d think that my placement on the first page for ‘Barack Obama’ (again thanks for the link) would drive a TON of traffic right? I mean, afterall, Barack is not only the leader of the free world but he’s also a bit of celebrity too (admit, he is). But I’ll release some proprietary information here and tell everyone that we receive 7,855 visits to all of our sites for the keyphrase ‘barack obama’ in July. Why? Because we aren’t #1 for the query. Plus, people don’t search like that. People don’t search for ‘Chicago’ very few search for ‘Chicago News’ (though we did receive 25,387 visits for that term in July, but we ARE number one for that term including site links). Some may say that IS a lot of traffic. It’s not…we receive over a million visits per day, from just Google. How? It’s about the long tail. What’s that? Ranking well on hundreds of thousands of longer keyphrases (3-4 word queries). Plus, I’ll admit other things that I’m not going to mention (no need to increase the comepetition).

    So why am I saying this? Afterall AP is a competitor right? Trust me. I know the type. I could prove to them a thousand different ways that this is just silly and they wouldn’t listen. They have made up their mind.

    End rant. ;-)

    Brent D. Payne
    Director, SEO
    Tribune Company
    @BrentDPayne

  • Pingback:   links for 2009-08-22 — contentious.com

  • Pingback: This Week in Search for 8/13/09 | JOE18.COM

  • Pingback: This Week in Search for 8/13/09 | SEO WeBX : Référencement, SEO et Google

  • Pingback: This Week in Search for 8/13/09 | Search Engine Optimisation

  • Pingback: How Tribune Co. plans to rid itself of SEO-killing duplicate content » Nieman Journalism Lab

  • Pingback: Sobre SEO « Run, Motherfucker, run

  • Pingback: Clay Shirky: Let a thousand flowers bloom to replace newspapers; don’t build a paywall around a public good » Nieman Journalism Lab

  • Pingback: The death of the story

  • Pingback: What The Associated Press is saying to Google, Microsoft, and Yahoo » Nieman Journalism Lab

  • Pingback: It’s the context, stupid « De nieuwe reporter

  • Pingback: AP to Dominate Google Rankings?