Nieman Foundation at Harvard
Are you willing to pay for Prepare to be asked before year’s end
ABOUT                    SUBSCRIBE
Oct. 31, 2016, 10:51 a.m.

The AP wants to use machine learning to automate turning print stories into broadcast ones

The experiment is part of a larger effort by the news agency to incorporate automation into its journalism.

On average, when an AP sportswriter covers a game, she produces eight different versions of the same story. Aside from writing the main print story, she has to write story summaries, separate ledes for both teams, convert the story to broadcast format, and more.

“It’s a manual labor nightmare,” Jim Kennedy, the AP’s senior vice president for strategy and enterprise development, told me in his New York office. Collectively, AP journalists spend about 800 hours a week converting print stories to broadcast format.

As a result, the AP is experimenting with machine learning in an attempt to automate some of those processes. The news agency wants to free up capacity for journalists while also increasing its output as it looks to provide new types of coverage to its clients to try and grow its business.

By 2020, the AP, Kennedy said, would like to automate 80 percent of its content production, though he admits that specific goal is “more aspirational than real.”

“Can we address that and start to shave time off of it so that person can do more? That same person who is sitting there churning out all that crap can take his iPad, go down to the locker room and capture video if he or she is not sitting there doing eight versions of the text story,” Kennedy said. “There is real benefit to be realized by doing this.”

(After this story was published, an AP spokeswoman emailed to say that, despite what Kennedy said, 80 percent is “not a goal but a reference to the need to manage increasing volumes of additional content that are expected in the future.”)

Over the past several months, as part of its work with Matter Ventures, a cross-sectional team of five AP staffers has been working on developing a framework to automate the process of converting print stories to broadcast format.

The team built a prototype that just identifies elements in print stories that need to be altered for broadcast. (Stories are shorter, sentences are more concise, attribution comes at the beginning of a sentence, numbers are rounded, and more.) Though the tool can identify those items, AP strategy and development manager Francesco Marconi, who worked on the project, cautioned that the news agency has yet to conduct real trials on the tool or run quality control tests.


To move forward with the project, the AP will have to partner with an outside company that specializes in machine learning. To start, it would focus on one specific sport and use archival versions of print and broadcast stories to develop an algorithm that could actually automate the versioning.

“There are a set amount of rules that our journalists know that they use to turn a print story into a broadcast story,” said AP strategy and development manager Francesco Marconi, who worked on the project. “That transition is not always clean. What machine learning can help us to is that essentially there is algorithm that compares a print story with the same story in a different version. For example, for broadcast, it identifies how a human would make those changes. Of course, the machine will be first guided by these set of rules but to get to a version that our editorial department is happy with we need to do that type of work of really teaching the machine the nuances.”

Last year, the AP created a five-year strategic plan to outline its company-wide goals through 2020. Last week, the news agency’s leadership met to begin to decide what it will prioritize in 2017 as it works toward those 2020 goals. Kennedy said there are seven initiatives under consideration for funding next year, including this effort around automation. Customer engagement and user generated content are some of the other areas that the AP is focused on, but Kennedy said he’s hopeful the agency will continue to fund the work around automation, which began in earnest in 2014.

That year, working with the company Automated Insights, the AP began automating some corporate earnings stories. The news agency now produces about 4,000 corporate earnings stories each quarter — ten times more than when it just had reporters writing stories. This year, it also started working with Automated Insights to cover minor league baseball games by turning data from box scores into text stories. The AP has also invested in the company.

“With minor league baseball, we never had any stories and didn’t cover it with humans,” Kennedy said. “Now we’re covering it without humans and creating stories that we didn’t have before. That’s been a real breakthrough. Based on that success, we wanted to see what else could we do in automation.”

Kennedy said he expects the process of working with an outside firm to develop the automated print-to-broadcast to take about six months. The prototype they’ve currently developed is pretty basic, and for the project to be successful, the broadcast stories will need to be good enough that they won’t need to be edited by human editors before they go out onto the wire.

“It might not take that long to do the simple task of turning a print story into a broadcast story — we’re thinking that’s pretty easy — but still, you have to teach it all the different kind of outputs that you might encounter sport-by-sport,” Kennedy said. “If you read all 4,000 of those corporate earnings stories, you’d see they’re pretty similar. Such and such beat expectations or didn’t beat expectations. That’s the lede of every single story. That’s not going to be the case as we do human-composed stories that we want to change into human-voiced stories.”

To reach its goal of having 80 percent of its coverage be automated by the end of the decade, the AP is looking at other types of automation beyond the print-to-broadcast versioning. For instance, it thinks it could ultimately use machine learning to adapt stories for different devices — from wearables to voice-activated speakers in the car or home.

Another use of automation would be to customize coverage for different clients or audiences. AP sportswriters already produce stories with “hometown ledes” for fans of both teams in a game. Reporters also use a different style when writing for a domestic or international audience. These are aspects that could be automated. Additionally, the AP thinks it could potentially use machine learning to write its stories in different voices for specific clients.

“Let’s say a newspaper or website is subscription based and, in its research, it’s identified five personas that are their typical subscribers,” Kennedy said. “We could literally, if we perfect this, design output for those five personas and drive some or all of our output through those five personas.”

The AP, Kennedy said, sees this type of versioning as a key element to its future. The wire service used to produce different distinct versions of stories — for morning and afternoon newspapers, for instance. But with the emergence of the Internet and the constant news cycle, the AP abandoned many of those practices in favor of “real-time versioning” and regularly updated stories.

“By creating one version of that realtime news report, we were also contributing to its commoditization,” Kennedy said. “All of our customers had moved to a single competitive space, which is now defined by the mobile-desktop space, but principally mobile.”

As newspapers continue to face economic challenges, Kennedy admitted that “new revenue is going to be slow to materialize and grow.”

There have been stories for years about newspapers ditching the AP, but the news agency is hopeful that new product offerings — including the automation efforts — will entice papers and other clients to use the service.

For example, when it comes to user-generated content, the AP worked with SAMDesk, a social media CMS in which it has invested, to create a tool that combines the AP wire with SAMDesk’s feed of user-generated content that it has authenticated or is attempting to authenticate. It initially created this tool for internal use, but it thinks it could also license it to clients.

“We’ve got big chunks falling out because newspapers continue to slide downhill, broadcasters are consolidating — and that will continue — and then native digital will be looking at these same solutions we’re looking at,” Kennedy said. “How do we stay relevant in all of that and how do we create value that could turn traditional customers around to want to buy more?”

Photo by Michael Dain used under a Creative Commons license.

POSTED     Oct. 31, 2016, 10:51 a.m.
Show tags
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Are you willing to pay for Prepare to be asked before year’s end
The cable news network plans to launch a new subscription product — details TBD — by the end of 2024. Will Mark Thompson repeat his New York Times success, or is CNN too different a brand to get people spending?
Errol Morris on whether you should be afraid of generative AI in documentaries
“Our task is to get back to the real world, to the extent that it is recoverable.”
In the world’s tech capital, Gazetteer SF is staying off platforms to produce good local journalism
“Thank goodness that the mandate will never be to look what’s getting the most Twitter likes.”