Nieman Foundation at Harvard
HOME
          
LATEST STORY
“AI reporters” are covering the events of the day in Northwest Arkansas
ABOUT                    SUBSCRIBE
Oct. 18, 2021, 9 a.m.
Reporting & Production

How A/B testing can (and can’t) improve your headline writing

“We found, surprisingly, that no single feature of a headline’s writing style makes much of a difference in forecasting success.”

What makes a good headline? Over-the-top superlatives? Clever wordplay? Eye-popping numbers?

Newsrooms grapple with this question on a daily basis. Headline writing is an art, a careful balance of both creatively informing the reader and also piquing their curiosity enough to click. But over the last few years headline writing has gotten a lot more analytical, relying on powerful data-driven techniques such as headline A/B testing.

Headline A/B testing lets a newsroom try out different versions of a story’s headline, to see which draws in more readers. When they come to a site’s homepage, some readers might see one headline (e.g., “Here’s how to boost your site’s traffic”), while others see an alternative version (“e.g., 15 ways to get more clicks”). An automated system tracks the click through rate of each version and eventually declares a winner of the attention contest.

Many large newsrooms take advantage of this practice to optimize their traffic. The New York Times tests headlines extensively on its homepage, as does The Boston Globe, which also experiments with visuals and layout. Analytics tools offering A/B testing, such as Chartbeat, now make it widely available to a range of newsrooms.

Beyond testing individual headlines, newsrooms try to use A/B testing data to draw out general guidelines around effective headline writing. In prior work, we talked to editors with all kinds of data-driven headline recommendations: including salient quotes and numbers, starting explanatory headlines with “how” or “why,” referencing important people and organizations by name, and others. When used in this way, the thinking goes, tests provide value beyond the individual stories tested. They make the newsroom more headline-savvy as a whole.

Generalizing headline test data is a common practice. But A/B tests are meant to optimize for specific cases. Their results don’t necessarily extrapolate beyond the actual story being tested. We wanted to see if newsrooms’ use of test data holds water — if we could make general recommendations for effective headline writing. Are general headline writing guidelines gleaned from A/B tests a safe bet?

In a new research paper published in Digital Journalism, we partnered with Chartbeat to analyze 141,000 headline A/B tests run by 293 news websites. Across all these tests, we looked for approaches that might work consistently: using an attention-grabbing detail, throwing in key phrases like “here’s why” or “this is how”, making it extra positive (or extra negative?), and so on. We examined all these recommendations and more, broadly considering headlines’ language and grammar, how well they conveyed key aspects of newsworthiness, and the specific words they used.

Using a machine-learned model trained on the Chartbeat data and these various aspects of headline writing, we then tried to predict the outcomes of A/B headline tests. Could we predict the winner of a headline test before running it, relying only on headline text to forecast success?

We found, surprisingly, that no single feature of a headline’s writing style makes much of a difference in forecasting success. There was some evidence for industry wisdom: Negative headlines sometimes did better, as did shorter and simpler ones. Clickbait-y words like “here” and “this” do (unfortunately from a user experience perspective) help. But when we tried to use these features to predict the outcomes of headline A/B tests, they were really limited. In fact, the vast majority of features didn’t improve our model’s predictive performance at all. There was no silver bullet that universally improved a headline’s performance.

Even more striking was that the winning headline often depends on factors entirely outside the writing. We found thousands of cases where editors repeated tests, trying the exact same headlines. But the winners of these repeat tests weren’t stable — a majority changed winners about a quarter of the time. It seems that a large portion of audiences’ decisions around what to click to read are out of journalists’ control.

In light of this, news organization practices around absorbing lessons from headline tests become a little concerning. A newsroom might run a few tests trying to determine if positive or negative headlines do better. Maybe in those cases, negative headlines win out. Does that result really mean that all your headlines should slant negative? Does it even mean that the writing style of the headline mattered at all, or that the A/B test actually measured readers’ response to headline sentiment?

Editors shouldn’t be so quick to jump to conclusions. Our results suggest that interpreting and extrapolating A/B test results like that is fraught, and might even lead to bad recommendations. So-called “best practices” can propagate without any basis in audiences’ real preferences. Headline writing only accounts for a small slice of what predicts a winning headline.

The good news, though, is that headline A/B testing is quite effective at its intended purpose, of finding the headline that garners more traffic from the audience during the window of time when the test is run. A/B tests can increase traffic to news stories—we saw a 20%+ lift in clickthrough rate for winning headlines. For key stories, like a major investigative piece, running a headline without A/B testing can leave web traffic on the table.

In our prior research, newsroom editors reported that introducing A/B testing in some headlines improved the click through rate of stories across the board, even those that weren’t tested. There is value in constantly iterating, experimenting, and gauging readers’ reactions; just as there is value in inviting reporters and editors to think critically about headline construction. In our observations, those strategies are more effective than trying to pinpoint words, phrases, or writing styles that ostensibly guarantee an improvement in headline performance across the board.

For newsrooms, then, our research offers a clear recommendation: A/B test your headlines to find the right one to optimize traffic in the given context and moment, but be cautious about trying to divine general writing lessons from those tests. Even if a writing style does well in tests, it might not generalize across the whole newsroom. A test can hinge on factors entirely outside an editor’s control, leading to faulty interpretations and bad recommendations. Headline testing is a powerful tool, but it’s most effective when paired with a deep understanding of your audience.

Nick Hagar is a PhD candidate in the Computational Journalism Lab at Northwestern University. Nick Diakopoulos is the director of the Computational Journalism Lab and an Associate Professor in Communication Studies and Computer Science at Northwestern University. He is also the author of Automating the News: How Algorithms are Rewriting the Media.

Photo of Newspaper headlines by m01229 on Flickr used under a Creative Commons license.

POSTED     Oct. 18, 2021, 9 a.m.
SEE MORE ON Reporting & Production
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
“AI reporters” are covering the events of the day in Northwest Arkansas
OkayNWA’s AI-generated news site is the future of local journalism and/or a glorified CMS.
Does legacy news help or hurt in the fight against election misinformation?
Plus: One way local newspapers covered the pandemic well, how rational thinking can encourage misinformation, and what a Muslim journalistic value system looks like.
Ear Hustle’s new audio space is just the first step in a bigger plan
The studio, at the California Institution for Women, will bring more incarcerated women’s voices to the podcast — and kickstart an ambitious training program.