HOME
          
LATEST STORY
The newsonomics of MLB’s pioneering mobile experience
ABOUT                    SUBSCRIBE
Oct. 27, 2011, 2 p.m.

The Guardian introduces @GuardianTagBot, a “Twitter-based search assistant”

The paper is employing a cheeky robot to help crowdsource its tag taxonomy.

So robots are one step closer to world domination. Or at least to info domination. This morning, the Guardian announced the birth of @GuardianTagBot, the living, tweeting, occasionally sleeping Twitter account that serves as the public face of the Guardian’s content API explorer. Tweet @GuardianTagBot with a search term — or a whole group of search terms — and it’ll @-reply you with a link to Guardian content that matches your query. Whether you’re looking for Nieman Lab mentions in the Guardian (who isn’t?), or wondering what Nick Clegg is up today (ditto), or concerned that David Cameron may be a lizard (um)…the bot probably has your answer.

“It’s rather like playing fetch with our articles, videos, galleries and audio,” Nina Lovelace, the Guardian’s content development manager, explained in a post announcing the tool. While returns are ad hoc — you have to re-ask @GuardianTagBot each time you want updated search results — if you save a search for GuardianTagBot, Meg Pickard, the Guardian’s head of digital engagement, points out, you can see results in real time, as well.

Again: world domination. Siri-ously.

The TagBot was developed in collaboration with the social media agency Smesh. When I asked Lovelace for more detail about TagBot’s interface, she replied in an email that the tool:

has been built by Smesh developers, who have implemented a kind of ‘Twitter bridge’ between Twitter and the Guardian’s content API. Smesh have built on their existing infrastructure for realtime tweet-wrangling to capture all incoming tweets @GuardianTagbot via Twitter’s streaming API. Smesh’s software then performs some semantic analysis on the tweets, cross-referencing against a customised version of the Guardian’s tagging database to find likely matches. Smesh then formulates a query against the Guardian content API, performs some additional processing on the results, and builds a dynamic results page to tweet back to the user. Hopefully with excellent quality content matches in!

TagBot is, as a robo-infant, a fragile creature; at this point, it can process only 3,000 queries a day. “The current primary limitation is around the degree of semantic analysis that it’s been possible to deliver for a fast beta build,” Lovelace told me. The 3,000-response limitation is based on Twitter’s API limits; the system itself, she said, “is lightweight, fast and scalable” — and “we hope to see the response limit raised or removed if the project continues past its beta phase, which will last a month.” (To account for any service interruptions, the Guardian has created a shadow account for @GuardianTagBot that can answer bot-related questions when TagBot tires out. It’s a telling one: @TagBotsHuman.)

TagBot is cleverly dual-purpose: On the one hand, it’s a useful (and, given the bot’s cheeky anthropomorphism, fun) service for Guardian users — one that, given its Guardian-content-only query returns, has the nice side effect of encouraging pageview-friendly, brand-centric site navigation. But the even-more-interesting innovation is the “please rate me!” request you’ll see below the bot’s search returns — one that asks you to tell TagBot whether it’s been “a good Bot” or “a bad Bot.” (Again: cheeky.) Data culled from @GuardianTagBot searches, Pickard told me, will help the Guardian’s tech editorial team to refine the site’s tag taxonomy — ostensibly both by learning what popular tags might be missing (“shed” has been one weird example) and by collecting semantic search data from users.

“TagBot will definitely make some mistakes,” Lovelace notes in her post, “but it’s here to help us check how well our tagging system is working.”

Essentially, The Guardian is crowdsourcing part of its tag taxonomy, looking to users to help it determine the tags, terms, and overall infrastructure that will best help it do the increasingly crucial job of site organization. If @GuardianTagBot catches on with users, it should provide an valuable — and, yet, basically cost-free — dataset for the Guardian’s tech team. Even after just a few hours of life, Lovelace told me, TagBot “is already showing us how our tagging system could be improved to better suit users.”

TagBot will also allow the Guardian, Lovelace notes, to test whether, in the future, the paper could provide a service that would allow users to sign up for tag-based content updates delivered through social media. (Maybe something like Google Alerts for Guardian content, delivered through Twitter.) Lovelace, true to the Guardian’s open interface, is looking for thoughts on that: “It would be great to get feedback,” she told me, “on how users might want this developed in future to tagbot@guardian.co.uk.”

TagBot, as a component of the paper’s digital-first move, is part of a larger effort at the Guardian to demonstrate — to users, to developers, and, significantly, to potential commercial partners — the cool things that can be done with the Guardian’s API. (“For example,” Lovelace says, “we are able to build similar services for clients if they’d like us to, using our great content, on different social networks and beyond.”) It’s yet another way for people to start thinking of the Guardian less as a newspaper, and more as a data platform. “Both the Guardian and Smesh are excited about potential for developing the idea further,” Lovelace says. “We have some great ideas, so watch this space.”

POSTED     Oct. 27, 2011, 2 p.m.
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
The newsonomics of MLB’s pioneering mobile experience
Running a sports league and running a news operation aren’t the same thing. But there are lessons to be learned from baseball’s success in navigating mobile.
Why The New York Times built a tool for crowdsourced time travel
Madison, a new tool that asks readers to help identify ads in the Times archives, is part of a new open source platform for crowdsourcing built by the company’s R&D Lab.
Opening up the archives: JSTOR wants to tie a library to the news
Its new site JSTOR Daily highlights interesting research and offers background and context on current events.
What to read next
1020
tweets
The newsonomics of the millennial moment
The new wave of news startups is aiming at a younger audience. But do legacy media companies have a chance at earning their attention?
803A mixed bag on apps: What The New York Times learned with NYT Opinion and NYT Now
The two apps were part of the paper’s plan to increase digital subscribers through smaller, targeted offerings. Now, with staff cutbacks on the way, one app is being shuttered and the other is being adjusted.
413The new Vox daily email, explained
The company’s newsletter, Vox Sentences, enters an increasingly crowded inbox. Can concise writing and smart aggregation on the day’s news help expand their audience?
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
Daily Mail
The Miami Herald
Mashable
OpenFile
BBC News
Sacramento Press
The Economist
El País
Animal Político
Outside.in
Journal Register Co.
Wisconsin Center for Investigative Journalism