Twitter  Quartz found an unlikely inspiration for its relaunched homepage: The email newsletter. nie.mn/1AQXuxD  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

What qualifies as a Spotlight story on Google News? Here’s a few clues

Google News launched a Spotlight section back in September to highlight “in-depth pieces of lasting value.” Initial response was positive, but with a few months under its belt I checked in to see if the feature is living up to that first flush of excitement.

The verdict?

It all depends on how you define “in-depth” and “lasting value.” The material on the page is certainly different from what you typically find on Google News. It’s a nice sample of deeper stories. But visiting the section doesn’t inspire the curiosity and intellectual satisfaction you’d get from a great magazine, newspaper or documentary film. “Lasting” isn’t a word that springs to mind. I’m guessing that has something to do with the algorithm.

Getting around the algorithm issue

The Spotlight page, like all of Google News, is automatically generated by one of Google’s secret algorithms. It’s impossible to discern exactly how stories are selected because Google guards algorithms the way Kentucky Fried Chicken protects those 11 herbs and spices.

But if Google News’ general ranking rules apply to the Spotlight page, there might be a few clues within this video from Maile Ohye, a tech lead at Google (full transcript is here). In the video, Ohye notes that Google uses keywords to categorize articles within Google News. That’s how a story ends up in business, sports, etc. Ohye used the following example to describe the classification process:

So you can see on this article, “The Millions Kozlowski Didn’t Steal.” We actually take out individual words, like business, Tyco, money, and CFO, and understand that this article pertains to the section of business.

Carrying this out a bit, it’s possible Spotlight articles are partially determined by a list of keywords and phrases. I’m thinking words like society, impact, and trend could signal the kind of bigger/deeper stories appropriate for Spotlight. On a lark, I combined all the text from 10 Spotlight stories into a Wordle cloud to see if any “lasting” words stood out. No luck on that front, though.

Truth is, there’s no way to fully understand how Spotlight stories become Spotlight stories because Google goes mum whenever algorithms are discussed. I asked. They politely declined.

So I went with the next best thing: grunt work. I took a snapshot of the Spotlight page on Jan. 4, 2010 at 12:02 p.m. and dug into the top 10 stories to see if any obvious commonalities were at play. (These are the same 10 stories I plugged into Wordle.) Here’s what I found:

Length: five of the 10 stories were more than 1,000 words long.

Posting date: seven stories were published four days before I took the snapshot (Dec. 31, 2009).

Comments: six stories had received more than 50 comments.

Source: nine stories were from what I’d consider to be major publishers.

The stories were all over the map topic-wise: straight news, financial analysis, sports, and even a Wall Street Journal column from Karl Rove. If there’s topical targeting here, I couldn’t find it.

As for the lingering criteria — “in-depth” and “lasting value” — I’ll say yes on the former and no on the latter. Many of the stories were deep dives into a particular issue, so those certainly qualify as in-depth. Something achieves “lasting value” in my mind if it goes beyond strict just-the-facts reporting or knee-jerk reactions. By that criteria, the New York Times’ “Safety of Beef Processing Method Is Questioned” is the only story that fits. Everything else was fleeting. Interesting, certainly, but not likely to be relevant in a few weeks.

Here’s the raw data from my analysis. Let me know if you spot any wayward trends I might have missed.

Story No. 1. The Biggest Losers
(Wall Street Journal, Jan. 3, 2010)

Type: Opinion
Word count: 953
Comments: 70

2. Google Plans Google Voice Enhancements
(TMCnet, Dec. 31, 2009)

Type: News analysis
Word count: 430
Comments: 0

3. Come Buy With Me and Be My Love
(New York Times, Dec. 31, 2009)

Type: Feature story
Word count: 1,865
Comments: Not enabled

4. Civil rights hero caught in corruption probe to begin serving sentence
(CNN, Jan. 4, 2010)

Type: News story
Word count: 1,219
Comments: 103

5. It’s All in How You See It: The Resolution Revolution
(Huffington Post, Dec. 31, 2009)

Type: Advice column from Mehmet Oz, M.D.
Word count: 1,346
Comments: 133

6. New Year’s Resolutions for Washington
(Wall Street Journal, Dec. 30, 2009)

Type: Opinion piece by Karl Rove
Word count: 830
Comments: 236

7. 2010 Draft prospects in BCS games
(SI.com, Dec. 31, 2009)

Type: Sports analysis
Word count: 1,847
Comments: Not enabled

8. Hole in the Moon Could Shelter Colonists
(FOXNews.com, Dec. 31, 2009)

Type: News story
Word count: 403
Comments: 14

9. Safety of Beef Processing Method Is Questioned
(New York Times, Dec. 31, 2009)

Type: Investigative report
Word count: 3,090
Comments: 383

10. 3 reasons home prices are heading lower
(CNNMoney.com, Dec. 31, 2009)

Type: Financial analysis
Word count: 695
Comments: 86

                                   
What to read next
Police Shooting Missouri
Joseph Lichterman    Aug. 22, 2014
Local meets global: The papers are jointly seeking reader-submitted stories of racial profiling and are cross-publishing each other’s work.
  • Pingback: Stories you may have missed from January 6th « Radioactive Gavin is Out of Print

  • Pingback: Korta klipp – 07 January 2010

  • Pingback: Stories you may have missed from January 6th « Radioactive Gavin is Out of Print

  • http://jonathanstray.com Jonathan Stray

    I think you’re probably thinking along the wrong lines in terms of how Google defines “lasting value,” imagining it to be based on editorial selection of key words.

    Google doesn’t do that, which is in my opinion part of their value. They don’t make semantic assumptions (remember, the small Google News team would have to do this for 40 or so languages) and they’re in the business of expressing what everyone else thinks, not voicing their own opinion.

    If I were assigned the project of algorithm development for the Spotlight section, I’d gather data like the time distribution of comments and new links. If people are still commenting on a story at the same (high) rate as they were five days ago, it’s probably not a flash-in-the-pan bit of spot news.

    The thresholds could even be calibrated automatically by examining the typical distribution of comment/link production for a news story and looking for long-lived outliers.

    I’m not saying this how Google does it, but this is the sort of thing I’d experiment with.

    BTW, I suspect your understanding of how article categorization works is similarly off. I would do it by generating word vectors for stories (See http://en.wikipedia.org/wiki/Vector_space_model) and comparing them to content known to be human-categorized in specific ways. Again, this is both category and language neutral.

    As a professional computer scientist, I maintain that journalism has a lot to learn from computational linguistics ;)

    – Jonathan

  • Pingback: Friday Weekly Reader | PressPass