Nieman Foundation at Harvard
HOME
          
LATEST STORY
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
ABOUT                    SUBSCRIBE
Nov. 29, 2012, 9:54 a.m.

The newsonomics of going deeper

Technology is aiding reporting at both the high and the low ends of the business.

The news industry appears to be having another one of its Admiral Stockdale moments. Who am I? Why am I here?

From Columbia’s “Post-Industrial Journalism: Adapting to the Present” report (“A new Columbia report examines the disrupted news universe”) to information overload, the basic roles of what news companies should do for readers and citizens seem once again at issue.

Without debating that here, let me point to one answer very much in formation: Go deeper.

It’s a new age, one with great potential to go deeper, broader, and smarter.

Going deeper means many things, from national investigative reporting to hyperlocal community info. Increasingly, it will be sports and features and entertainment as well. What I’m particularly intrigued about is how technology is rapidly improving the trade’s ability to go deeper — and to go deeper faster and cheaper. (A couple of decades ago in Portland, I recall seeing a housepainter’s business card. At each of its four corners was a single word: Faster, Cheaper, Better, Sooner. I always thought that had universal application.)

You’ve read about some of this, with the “Robots Ate My Newspaper” headlines this summer as the Journatic faked-bylines scandal fueled popular dismay. Well beyond the headlines lies a bigger movement. It’s not quite a computer-generated revolution, though technology aids, assists, and adjusts our thinking as to what’s possible. As we look at a few of the data points in the newsonomics of going deeper, let’s remember why this is important.

First: Readers are fast becoming the primary revenue source for newspapers (“The newsonomics of majority reader revenue”).

Second: We live in an age of way too much. People want context, not more content.

Third: Creating content is too expensive. In the age of low-cost aggregation and low-cost user- and citizen-generated content, any way of reducing editorial labor costs or maximizing productivity to produce good, differentiating news content is a necessity.

Fourth: It’s a business differentiator for all media — TV, newspapers, and more — in a world of too much.

Hearst Television News VP Candy Altman makes the last point succinctly: “The only way to differentiate yourself in this fragmented world is through the best content. We have some very strong investigative units in our company, but investigative reporting tools can and should be used in all of our reporting.” Consequently, Hearst TV teamed up with Investigative Reporters and Editors (IRE) on four regional workshops focused on techniques for using data mining in investigative reporting.

Hearst isn’t the only local broadcaster upping its game. The 10 NBC-owned local news stations, in major markets from L.A. to New York, have doubled the number of their editors, reporters, and shooters devoted to investigative work in a single year; they now include 62 staffers. About a third of them attended a multi-day IRE workshop as well. (A recent Dallas-Fort Worth story led to the Fort Worth police department banning its officers themselves from texting while driving, which reporting showed had led to 15 accidents.)

Further, Gannett and McClatchy are other news companies that have invested in more investigative training for their staffs.

Another sure indicator is IRE’s own membership rolls. A veteran trade group, IRE membership had suffered along with the industry. With about 5,000 members in 2005, it was down to 3,400 in 2009. Now it’s back in the vicinity of 4,300, says Mark Horvit, IRE executive director. As importantly, the kind of technology-aided work IRE is focusing on is morphing. IRE’s conferences and data training sessions are focusing beyond traditional technique.

Investigative journalists have long focused on existing databases, government and otherwise, “mining” that “structured data” (already in fields or categories). That work continues. What’s growing rapidly is the figuring out how to get at unstructured data; that’s where the “pioneering” work is being done, says Horvit. Emails, legislative bills, government bureau and courts documents, press releases; you name it. Stuff in unstructured prose.

“It’s a higher degree of math difficulty, to be sure,” says Chase Davis, director of technology at the Berkeley-based Center for Investigative Reporting (CIR). Yet his data team of four and high dozens, if not hundreds of journalists across the country are now applying machine learning, natural language processing (NLP), and document clustering to their work. All those terms have specific meanings, and there is yet more jargon that all makes sense to its practitioners. For the rest of us, it’s important to understand this: Well-programmed technology can do a lot of journalistic heavy lifting.

In part, all the technological innovation simply lets smart journalists ask better questions and get a faster result. It both allows journalists to get questions they know they’d like to answer — and goes a step beyond.

Getting at unstructured data opens inquiry to lots of content previously beyond reach. Machine learning, says Davis, “allows datasets to tell you their stories. You don’t have to be limited by your own experience.”

For instance, analyzing a congressman’s emails may yield patterns of contacts journalists didn’t even know to ask about. Doing an algorithmic dive into campaign records, as IRE and CIR did using Kaggle (which turned data science into sport, as amateurs could take on statistical wizards), produced all kinds of trends in campaign finance that journalists hadn’t yet considered. ProPublica’s Message Machine unearthed facts about how 2012 political targeting was really working, after first using crowdsourcing to gather many of the presidential campaign pitches citizens were receiving. Jeff Larson, a ProPublica news apps developer — who well straddles the line between journalist and techie — said the nonprofit then reverse-engineered the emails, using both machine learning and NLP to find patterns, make sense and produce stories on the changed nature of presidential marketing. (Former Wall Street Journal publisher Gordon Crovitz gives a good overview of the Obama campaign’s huge data advantage, and how it was built; note how far behind that campaign the U.S. news industry finds itself in smartly targeting.)

This pioneering work “opens up fantastic new avenues for looking for trends, for finding the hidden story,” says IRE’s Horvit. “You can stare at a spreadsheet ’til your eyes pop out. If you use software intelligently [with structured content], it pulls out the story for you. If you can develop software — and they are — that deals with large amounts of text, you get that quantum leap.”

High-minded issues of national public impact — campaign spending, Big Pharma’s payments to doctors, national security (a Center for Public Integrity focus area) — are one hot area here. Another is at another end of the spectrum: local and hyperlocal news and information. Journatic CEO Brian Timpone is in the forefront of the work and the thinking here.

Put aside whatever you think about the company’s byline scandal and focus on what Journatic does. Timpone talks about becoming the “Bloomberg of Local.” Timpone’s vision is to sweep up all kinds of local information that has only haphazardly rolled into newspapers over the years. For starters, that’s school notes, book club information, parish reports, real estate listings, PTA and library newsletters — times 100. In a community of of 30,000 people, Timpone notes, there may be 750 organizations — and they all generate information. That’s the kind of work Journatic does with both Tribune (example: Newport News local) and Hearst (example: Ultimate Katy).

The process is an orderly one. Identify the sources of the needed local info, and get the flow of it started through outreach. Then, collect and “clean” the data, so that it is readable; Journatic’s use of offshore labor is involved here. Then, it’s structured, “breaking it into datapoints,” with editors and algorithm writers in the U.S. doing that work, says Timpone. Part of that work is creating “metrics on top of the data,” looking for newsy patterns. Yes, it’s about real estate and prep sports, but it can also be purposed beyond that, in ways that sound like the work the national investigative organizations are doing. Timpone says Journatic can answer the question: “Which people in the 19th Ward in Chicago donate the most per capita to political campaigns, using property tax values as an indicator of wealth?”

In Houston alone, working with the Houston Chronicle, Journatic has received more than a million emails from community groups within the past three years, each offering some kind of community information. Timpone makes the point that it’s not just the receiving, cleaning up, and routinizing of the data in the emails; it’s about learning about those emails and their senders over time. If a church sends a weekly email, with community information, the system learns about those submitting info.

“We know how to treat it next time,” which makes a big cost difference to high-throughput Journatic. “Processing time is a big deal to us.”

Another early player, Narrative Science, after making early waves in the local news space, seems today more focused on retail and financial markets. Make no mistake: The techniques we’re talking about here are roiling many other industries as business intelligence gets a complete makeover, due to data mining.

Then there are the many in-between uses. My fellow former and current features editors will find fertile ground in machine learning. One reason I know that is that Silicon Valley software companies are already talking about how to mine content to produce automated Top 10 lists — from newspaper and many other sources. Yes, Top 10 lists, a staple of feature sections and monthly magazines forever. Such thinking buttresses another Timpone point: Why rely on the memory of an individual reporter or editor, when you can have trained algorithms search though deep databases of content to produce all kinds of content, including such Top 10s as top vacation spots, schools, parks, beloved local musicians, and much more?

It’s a new age, one with great potential to go deeper, broader, and smarter. With new tech assists, we may have new antidotes for journalism that can be too shallow, too narrow, and too dumb.

POSTED     Nov. 29, 2012, 9:54 a.m.
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
PressPad, an attempt to bring some class diversity to posh British journalism, is shutting down
“While there is even more need for this intervention than when we began the project, the initiative needs more resources than the current team can provide.”
Is the Texas Tribune an example or an exception? A conversation with Evan Smith about earned income
“I think risk aversion is the thing that’s killing our business right now.”
The California Journalism Preservation Act would do more harm than good. Here’s how the state might better help news
“If there are resources to be put to work, we must ask where those resources should come from, who should receive them, and on what basis they should be distributed.”