Robot journalists are having a moment. The idea of narrative stories built by algorithms opens up a lot of possibilities — and taps into a lot of fears for human journalists worried about becoming outmoded.
But many of the most prominent algorithms doing journalism are black boxes, unknowable to the public. How do robot journalists do their work?
Nick Diakopoulos has an interesting piece that takes advantage of patent filings to outline the basics. Based on what Narrative Science has filed, here’s the basic outline: “(1) ingest data, (2) compute newsworthy aspects of the data, (3) identify relevant angles and prioritize them, (4) link angles to story points, and (5) generate the output text.” (He’s got far more detail in his post.)
Some parts of that are straightforward; others, like how “newsworthiness” is determined, can be more contentious:
From my reading, I’d have to say that the Narrative Science patent seems to be the most informed by journalism. It stresses the notion of newsworthiness and editorial in crafting a narrative…What still seems to be lacking though is a broader sense of newsworthiness besides “deviance” in these algorithms. Harcup and O’Neill identified 10 modern newsworthiness values, each of which we might make an attempt at mimicking in code: reference to the power elite, reference to celebrities, entertainment, surprise, bad news, good news, magnitude (i.e. significance to a large number of people), cultural relevance to audience, follow-up, and newspaper agenda. How might robot journalists evolve when they have a fuller palette of editorial intents available to them?
— Joshua Benton