HOME
          
LATEST STORY
Better together: How two St. Louis nonprofit newsrooms are learning to thrive as one outlet
ABOUT                    SUBSCRIBE
Oct. 27, 2011, 2 p.m.

The Guardian introduces @GuardianTagBot, a “Twitter-based search assistant”

The paper is employing a cheeky robot to help crowdsource its tag taxonomy.

So robots are one step closer to world domination. Or at least to info domination. This morning, the Guardian announced the birth of @GuardianTagBot, the living, tweeting, occasionally sleeping Twitter account that serves as the public face of the Guardian’s content API explorer. Tweet @GuardianTagBot with a search term — or a whole group of search terms — and it’ll @-reply you with a link to Guardian content that matches your query. Whether you’re looking for Nieman Lab mentions in the Guardian (who isn’t?), or wondering what Nick Clegg is up today (ditto), or concerned that David Cameron may be a lizard (um)…the bot probably has your answer.

“It’s rather like playing fetch with our articles, videos, galleries and audio,” Nina Lovelace, the Guardian’s content development manager, explained in a post announcing the tool. While returns are ad hoc — you have to re-ask @GuardianTagBot each time you want updated search results — if you save a search for GuardianTagBot, Meg Pickard, the Guardian’s head of digital engagement, points out, you can see results in real time, as well.

Again: world domination. Siri-ously.

The TagBot was developed in collaboration with the social media agency Smesh. When I asked Lovelace for more detail about TagBot’s interface, she replied in an email that the tool:

has been built by Smesh developers, who have implemented a kind of ‘Twitter bridge’ between Twitter and the Guardian’s content API. Smesh have built on their existing infrastructure for realtime tweet-wrangling to capture all incoming tweets @GuardianTagbot via Twitter’s streaming API. Smesh’s software then performs some semantic analysis on the tweets, cross-referencing against a customised version of the Guardian’s tagging database to find likely matches. Smesh then formulates a query against the Guardian content API, performs some additional processing on the results, and builds a dynamic results page to tweet back to the user. Hopefully with excellent quality content matches in!

TagBot is, as a robo-infant, a fragile creature; at this point, it can process only 3,000 queries a day. “The current primary limitation is around the degree of semantic analysis that it’s been possible to deliver for a fast beta build,” Lovelace told me. The 3,000-response limitation is based on Twitter’s API limits; the system itself, she said, “is lightweight, fast and scalable” — and “we hope to see the response limit raised or removed if the project continues past its beta phase, which will last a month.” (To account for any service interruptions, the Guardian has created a shadow account for @GuardianTagBot that can answer bot-related questions when TagBot tires out. It’s a telling one: @TagBotsHuman.)

TagBot is cleverly dual-purpose: On the one hand, it’s a useful (and, given the bot’s cheeky anthropomorphism, fun) service for Guardian users — one that, given its Guardian-content-only query returns, has the nice side effect of encouraging pageview-friendly, brand-centric site navigation. But the even-more-interesting innovation is the “please rate me!” request you’ll see below the bot’s search returns — one that asks you to tell TagBot whether it’s been “a good Bot” or “a bad Bot.” (Again: cheeky.) Data culled from @GuardianTagBot searches, Pickard told me, will help the Guardian’s tech editorial team to refine the site’s tag taxonomy — ostensibly both by learning what popular tags might be missing (“shed” has been one weird example) and by collecting semantic search data from users.

“TagBot will definitely make some mistakes,” Lovelace notes in her post, “but it’s here to help us check how well our tagging system is working.”

Essentially, The Guardian is crowdsourcing part of its tag taxonomy, looking to users to help it determine the tags, terms, and overall infrastructure that will best help it do the increasingly crucial job of site organization. If @GuardianTagBot catches on with users, it should provide an valuable — and, yet, basically cost-free — dataset for the Guardian’s tech team. Even after just a few hours of life, Lovelace told me, TagBot “is already showing us how our tagging system could be improved to better suit users.”

TagBot will also allow the Guardian, Lovelace notes, to test whether, in the future, the paper could provide a service that would allow users to sign up for tag-based content updates delivered through social media. (Maybe something like Google Alerts for Guardian content, delivered through Twitter.) Lovelace, true to the Guardian’s open interface, is looking for thoughts on that: “It would be great to get feedback,” she told me, “on how users might want this developed in future to tagbot@guardian.co.uk.”

TagBot, as a component of the paper’s digital-first move, is part of a larger effort at the Guardian to demonstrate — to users, to developers, and, significantly, to potential commercial partners — the cool things that can be done with the Guardian’s API. (“For example,” Lovelace says, “we are able to build similar services for clients if they’d like us to, using our great content, on different social networks and beyond.”) It’s yet another way for people to start thinking of the Guardian less as a newspaper, and more as a data platform. “Both the Guardian and Smesh are excited about potential for developing the idea further,” Lovelace says. “We have some great ideas, so watch this space.”

POSTED     Oct. 27, 2011, 2 p.m.
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
Better together: How two St. Louis nonprofit newsrooms are learning to thrive as one outlet
St. Louis Public Radio and the St. Louis Beacon joined forces last December. Now, after a summer covering the protests in Ferguson, the combined newsroom is hitting its stride.
From Nieman Reports: Digital is bringing un grand dérangement to French news institutions
Ousted editors, newsroom revolts, and government subsidies — welcome to French journalism’s battle for survival.
A conversation with David Rose, little magazine veteran and publisher of Lapham’s Quarterly
“I hear the argument, Oh, these poor little magazines with their tiny readerships, if only people appreciated them more. It’s partly true. But the bigger side of that is, well, if only you knew how to read a budget. If only you actually knew anything about publishing.”
What to read next
727
tweets
When it comes to chasing clicks, journalists say one thing but feel pressure to do another
Newsroom ethnographer Angèle Christin studied digital publications in France and the U.S. in order to compare how performance metrics influence culture.
714Wearables could make the “glance” a new subatomic unit of news
“The audience wants to go faster. This can’t be solved with responsive design; it demands an original approach, certainly at the start.”
592Ken Doctor: Guardian Space & Guardian Membership, playing the physical/digital continuum
The Guardian is making its biggest bet on memberships and events by renovating a 30,000 square foot space to host live activities in the heart of London.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Las Vegas Sun
Byliner
NBC News
ABC News
Arizona Guardian
Tumblr
The Times of London
Apple
USA Today
Grist
PBS
Yahoo