A wave of P.R. data

“The wave of bullshit data is rising, and now it’s our turn to figure out how not to get swept away.”

Nobody can say exactly when the trend first started, but in 2014 we saw the first major outbreaks of bogus data distributed by private companies just so it would go viral online. Among the many exciting thing we’ve learned this year are:

jacob-harrisTo be blunt, all of these stories were unredeemably awful, riddled with errors and faulty assumptions. But accuracy wasn’t the point. All of these examples of “data journalism” were generated by companies looking for coverage from online news organizations. The goal is a viral feedback loop, where the story is reaggregated by others, the site surges in its organic search rankings, and the study is tweeted for days even by haters like myself. For these purposes, they were perfectly designed to exploit the nature of modern news distribution online.

The old adage of “fast, cheap, good — pick two” often used about software development also applies to news, where good is not just a function of your current work but your established reputation. So many news organizations on the web start at a disadvantage, with their only option to put as many fast and cheap stories out there to hit their monthly traffic targets. And everybody knows that posts that feature several key charts or 40 maps that explain something tend to do pretty well in traffic.

But it takes time to gather the data yourself — so it’s much better if someone provides it to you. Which is how we got here. It’s not unusual for news organizations to source data from private companies much like they would from government agencies or scientific agencies. For instance, The New York Times sources data from a company that tracks executive compensation to report on trends in CEO pay every year.

But the PR-driven data stories I listed above come from an opposite direction to traditional data journalism. This is not data that is collected and analyzed in response to specific questions and whose quality is checked before publication, but prebuilt charts pushed to news organizations like press releases and targeted against specific topics like sex, anxiety, and shame that are more likely to elicit clicks. If you’re a company looking for press, why not use those fancy data scientists you hired to also generate some free publicity outside the company? And if you’re a reporter at a news startup who needs to constantly fill the news hole with new material, why wouldn’t you run one of these? Everybody’s happy, even if the data isn’t right.

And in 2015, it will only get worse — because I’d bet the big PR firms have noticed the success of some of these smaller efforts and will try their hands at this new form of marketing. Don’t be surprised when Kraft creates a map of which states consume the most macaroni and cheese, or Starbucks releases charts showing how pumpkin spice-related products lift the American economy each fall. The wave of bullshit data is rising, and now it’s our turn to figure out how not to get swept away. Maybe Snopes sells life rafts?

Jacob Harris is a senior software architect at The New York Times.