Viktor Mayer-Schönberger and Kenneth Cukier published their joint tome on big data this week, Big Data: A Revolution That Will Transform How We Live, Work and Think. Mayer-Schönberger, a professor of Internet governance and regulation at Oxford, and Cukier, the data editor of The Economist, argue that having access to vast amounts of data will soon overwhelm our natural human tendency to look for correlation and causality where there is none. In the near future, we’ll be able to rely on much larger pools of “messy” data rather than small pools of “clean” data to get more accurate answers to our questions.
“We are taking things we never thought of as informational and rendering them in data,” Mayer-Schönberger said in a talk Wednesday at the Berkman Center for Internet & Society at Harvard. “Once we think of it as data, we can organize it and extract new information.”
In their book, Mayer-Schönberger and Cukier give a number of examples of industries that will be changed forever by the new messiness of data. Bradford Cross cofounded FlightCaster.com, which predicted U.S. flight delays using data about flight times and weather patterns. The company was sold in 2011, at which point “Cross turned his sights on another aging industry.” He started Prismatic, one of a number of news aggregators that filters content for users by analyzing data about sharing frequency on social networks and user preferences. Mayer-Schönberger and Cukier write:
This is a humbling reminder to the high priests of mainstream media that the public is in aggregate more knowledgeable than they are, and that cuff linked journalists must compete against bloggers in their bathrobes. Yet the key point is that it is hard to imagine that Prismatic would have emerged from within the media industry itself, even though it collects lots of information. The regulars around the bar of the national Press Club never thought to reuse online data about media consumption. Nor might the analytics specialists in Armonk, New York or Bangalore, India have harnessed the information in this way. It took Cross, a louche outsider with disheveled hair and a slacker’s drawl, to presume that by using data he could tell the world what it ought pay attention to better than the editors of The New York Times.
I didn’t ask Cukier on Thursday whether he is a Press Club bar regular, but I did ask him how he sees data affecting his work at The Economist:
One interesting thing is, I’ve taken a lot of the data from the books and arts section related to what’s popular online — online-only content — versus what is popular in print, and looking for interesting anomalies. It turns out among our most interesting correlations or findings is that our Q&As with authors are extremely popular on the Internet, and it is one of the formats with which we almost never run in the paper. So, it’s suggesting to us that we may want to take this format that’s really popular online into the paper.
Cukier is aware that, for many editors, the idea that an algorithm might know something about what your readers want can be a hard pill to swallow. In an interview with Wired, Cukier said that when speaking about the book in Britain, humanities professors often reproached him for propagating the idea that the quality of their work could be quantifiable. “I’d think it’s actually very reasonable if you’re going to produce something like art, that you try to look for ways to improve it and understand it by, if you will, how many people it reaches, how many times it’s been shared on the Internet,” he said. On Thursday, Cukier extended that same line of thought to the work of journalist in the digital age:
If you are the books and arts editor, to say, you know, “I really sort of believe that I know what is best on a weekly basis for my audience” — to actually second guess yourself and say, “I recognize that in some ways I have good instincts, but in some ways I am blindfolded. I am not going to blindly accept the data, but I’m not going to be blind to it either.” These are techniques I think a lot of media companies are using — something The Economist is walking very light-footedly into, to be careful, but I think it’s necessary.
He added a note of caution — the talk was called “Big Data — and its Dark Side,” after all — pointing out that the reader who pays for a print subscription to The Economist is quite different than the one who sees a link to an Economist story on Facebook and clicks. But ultimately, “moneyballing” print publications is the new reality, he says, replacing cocktails or Sunday brunch as the best method of understanding your reader.
Cukier is a journalist, not just a numbers man, having been The Economist’s business correspondent in Japan and global technology correspondent before that. So when he looks at how data is changing how we think, he also thinks about what journalists can do with it:
When we teach journalism in the future, we’re not just going to teach people the fundamentals of how to do an interview, or what a lede paragraph is. We’re going to tell people how to interview databases. And also, just as we train journalists by telling them that sometimes people that we interview are unfaithful and lie, we’re going to have to teach them to be suspicious of the data, because sometimes the data lies, too. You have to bring the same scrutiny as in the analog world — talking to people and observing — to the data as well.
Mayer-Schönberger and Cukier shared plenty of dark scenarios of where big data could lead us — to a world where you can be imprisoned not on the basis of crimes you did commit but crimes you might commit, or a world in which the owners of data begin to look and act like the railroad barons of the 19th century. But despite these predictions, Mayer-Schönberger and Cukier are ultimately closer to being a pair of cheerleaders than naysayers. Said Mayer-Schönberger: “The culprit here is not big data itself, but how we use it.”
Photo of Kenneth Cukier by Joi Ito used under a Creative Commons license.