HOME
          
LATEST STORY
BuzzFeed now has editorial and product people in place for its forthcoming news app
ABOUT                    SUBSCRIBE
April 1, 2013, 10:18 a.m.
Audience & Social

Not an April Fool’s joke: The New York Times has built a haiku bot

Times Haiku are generated from stories on the homepage of NYTimes.com, just in time for National Poetry Month.

timeshaiku2

New York Times senior software architect Jacob Harris has a thing for robots and wordplay. You may recall he’s the guy behind @nytimes_ebooks, the Times answer to the elusive and inscrutable Twitter bot @Horse_ebooks.

So it’s only natural that Harris has now created an algorithm that extrudes haiku out of the text of Times stories. In other words:

Haiku harvester
built inside The New York Times —
does it have a soul?

(If my eighth grade English teacher is reading this. Sorry.)

Here’s a better, more Times-y example:

timeshaiku1

Times Haiku is a collection of what they are calling “serendipitous poetry,” derived from stories that have made the homepage of NYTimes.com. The haiku live on a Tumblr hosted by the Times. Harris built a script that mines stories for haiku-friendly words and then reassembles them into poetry. (For those of you that may have zoned out in class, haiku are comprised of three lines with, in order, five, seven, and five syllables.) The code checks words against an open source pronunciation dictionary, which handily also contains syllable counts.

“Sometimes it can be an ordinary sentence in context, but pulled out of context it has a strange comedy or beauty to it,” Harris said.

Harris was inspired by Haikuleaks, a similar project that found poetry in the cache of diplomatic cables released by WikiLeaks in 2010. The backbone of that project was an open source program called Haiku Finder, which crawls through text to generate haiku. The program was built in Python; Harris made his own version in Ruby on Rails.

The result, much like @nytimes_ebooks, is bizarre, quirky, and kind of zen. The haiku have a strange way of getting at the heart of a story, or teasing out interesting fragments from an article. “There’s something appealing about finding these snippets of text, these turns of phrase and pulling them out,” Harris said. “You find it compelling and it drives you to read the article that it came from.” (Think of it as a more lexicographically strict version of Paul Ford’s SavePublishing.)

In its own poetic way, Times Haiku will be another access point for Times stories, said Marc Lavallee, assistant editor for interactive news at the Times. “If someone sees the site, or the image of an individual haiku and shares it on Tumblr, and it gets them to think about who we are and what we do, or gives them a moment of pause, I think we’ve succeeded in a way,” Lavallee said.

Lexi Mainland, social media editor for the Times, said they wanted the poems to be able to stand on their own and be readily sharable. That’s why the haiku are actually images, which fits well with the aesthetic of Tumblr, she said. Outside of Tumblr, the Times will promote the haiku through the paper’s flagship Twitter account.

That the Times has the ability to build a haiku bot isn’t surprising. But why build a haiku bot? “A lot of the projects we work on here are these incredibly big heaves, which are very, very gratifying,” said Mainland. “But you crave these smaller projects, which are just as valuable.” Similarly, projects like the haiku bot may seem silly on the surface, but the underlying code, the use of natural language processing, or other components could be valuable to future projects, Lavallee said.

It helps that the project came at little expense to the Times — Harris put it together on his own during a fit of post-election letdown. Harris had been working on projects connected to the presidential race for over a year, and after election day suddenly found himself with idle hands. He wrote the code in November and began monitoring what it was spitting out. After showing it to Mainland, Lavallee, and other editors, they gave the project a green light. Designer Heena Ko and software developer Anjali Bhojani gave the haiku their distinctive appearance for Tumblr. (Those lines you see running askew of the text of the haiku? The length is computer generated, based on the meter of the first line of text.)

As whimsical as a haiku bot or a spammy-sounding Twitter bot might be, both are efforts to find new uses for the Times’ vast collection of work. “It’s just this large corpus of text that gets very dizzing to look through,” Harris said.

The Times may also have a soft spot for artwork inspired by the written word. Anyone who has visited the lobby of The New York Times Building has likely seen Moveable Type, an algorithm-backed art installation that displays fragments of Times content across 560 display screens.

But why poetry? For starters, today is the first day of National Poetry Month, Mainland said. (Today is also April Fool’s — and if you were wondering, this is not a joke.) Still, for lovers of verse, it may sound like a cold and bloodless way to create poetry. Can you really create poetry without a soul? Do robots have feelings? Can they really see a sunset, or be moved by the sounds of a whale songs CD?

Harris admits the bot is imperfect; it’s required a little teaching along the way. One reason he limited the scope to the front page was because it provides an editor-picked selection, which tends to be richer features and important daily fare. (Running the bot on the Times Wire, Harris said he often got haiku made up of basketball scores, which may be too esoteric for any lit major or stat nerd.) The algorithm is designed to toss haiku with certain sentence constructions (sentences that start with a preposition, for instance) or from sensitive stories. Mainland, Lavallee, and Harris also keep an eye on the haiku being created to see if anything untoward sneaks through.

But Harris also has to do some syllable counting himself, teaching the bot words that appear in the Times (“Rihanna,” for instance) that it doesn’t know. Henry Higgins would be proud.

POSTED     April 1, 2013, 10:18 a.m.
SEE MORE ON Audience & Social
SHARE THIS STORY
   
Show comments  
Show tags
 
Join the 15,000 who get the freshest future-of-journalism news in our daily email.
BuzzFeed now has editorial and product people in place for its forthcoming news app
Stacy-Marie Ishmael, coming from the FT, will join Noah Chestnut on the new, newsroom-integrated team.
Breaking up the pledge drive: Boston’s WBUR wants to build a new model for public media funding
The station is putting together a team for BizLab, a project that will work outside of day-to-day operations to transform the public radio revenue strategy for digital.
Ken Doctor: How interim is everything about the Orange County Register?
The troubled paper has a new (interim) publisher, whose experience is in the casino business. Is there a bigger plan at work, or is Aaron Kushner just lurching from idea to idea?
What to read next
1020
tweets
The newsonomics of the millennial moment
The new wave of news startups is aiming at a younger audience. But do legacy media companies have a chance at earning their attention?
803A mixed bag on apps: What The New York Times learned with NYT Opinion and NYT Now
The two apps were part of the paper’s plan to increase digital subscribers through smaller, targeted offerings. Now, with staff cutbacks on the way, one app is being shuttered and the other is being adjusted.
537Watching what happens: The New York Times is making a front-page bet on real-time aggregation
A new homepage feature called “Watching” offers readers a feed of headlines, tweets, and multimedia from around the web.
These stories are our most popular on Twitter over the past 30 days.
See all our most recent pieces ➚
Fuego is our heat-seeking Twitter bot, tracking the links the future-of-journalism crowd is talking about most on Twitter.
Here are a few of the top links Fuego’s currently watching.   Get the full Fuego ➚
Encyclo is our encyclopedia of the future of news, chronicling the key players in journalism’s evolution.
Here are a few of the entries you’ll find in Encyclo.   Get the full Encyclo ➚
USA Today
Hechinger Report
The Fiscal Times
Fox News
DNAinfo
The New Republic
The UpTake
Quora
Tribune Publishing
Investigative News Network
Time
U.S. News & World Report