Twitter  The BBC is launching a premium subscriber TV channel in Australia. nie.mn/11ub7Bj  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

Bull beware: Truth goggles sniff out suspicious sentences in news

A graduate student at the MIT Media Lab is writing software that can highlight false claims in articles, just like spell check.

You’re reading a wrap-up of the Sept. 22 Republican presidential debate when you land on this claim from Rep. Michele Bachmann: “President Obama has the lowest public approval ratings of any president in modern times.”

Really? You start googling for evidence. Maybe you scour the blogs or the fact-checking sites. It takes work, all that critical thinking.

That’s why Dan Schultz, a graduate student at the MIT Media Lab (and newly named Knight-Mozilla fellow for 2012), is devoting his thesis to automatic bullshit detection. Schultz is building what he calls truth goggles — not actual magical eyewear, alas, but software that flags suspicious claims in news articles and helps readers determine their truthiness. It’s possible because of a novel arrangement: Schultz struck a deal with fact-checker PolitiFact for access to its private APIs.

If you had the truth goggles installed and came across Bachmann’s debate claim, the suspicious sentence might be highlighted. You would see right away that the congresswoman’s pants were on fire. And you could explore the data to discover that Bachmann, in fact, wears some of the more flammable pants in politics.

“I’m very interested in looking at ways to trigger people’s critical abilities so they think a little bit harder about what they’re reading…before adopting it into their worldview,” Schultz told me. It’s not that the truth isn’t out there, he says — it’s that it should be easier to find. He wants to embed critical thinking into news the way we embed photos and video today: “I want to bridge the gap between the corpus of facts and the actual media consumption experience.”

Imagine the possibilities, not just for news consumers but producers. Enhanced spell check for journalists! A suspicious sentence is underlined, offering more factual alternatives. Or maybe Clippy chimes in: “It looks like you’re lying to your readers!” The software could even be extended to email clients to debunk those chain letters from your crazy uncle in Florida.

Schultz is careful to clarify: His software is not designed to determine lies from truth on its own. That remains primarily the province of real humans. The software is being designed to detect words and phrases that show up in PolitiFact’s database, relying on PolitiFact’s researchers for the truth-telling. “It’s not just deciding what’s bullshit. It’s deciding what has been judged,” he said. “In other words, it’s picking out things that somebody identified as being potentially dubious.”

That means the software might flag a Bachmann claim from another debate — “Our government right now — this is significant — we are spending 40 percent more than what we take in” — and mark it as true. PolitiFact had investigated that claim and the claim checked out.

Things get trickier when a claim is not a word-for-word match. For example, the reporter paraphrases: “Our government right now…[is] spending 40 percent more than what we take in,” Bachmann said. Or: Bachman said government spending is 40 percent higher than revenue. It’s not easy for computers to understand the nuances of language the way we do.

(An adviser at the Center for Civic Media, Ethan Zuckerman, is wrestling the same ideas for his more meta-news literacy project, MediaRDI, which would stick nutritional labels on the news.)

Schultz’s work explores natural language processing, in which computers learn to talk the way we do. If you’ve ever met Siri, you’ve experienced NLP. Schultz’s colleagues at the Media Lab invented Luminoso, a tool for what the Lab calls “common sense computing.” The Luminoso database is loaded with simple descriptions of things: “Millions and millions of things…Like, ‘Food is eaten’ or ‘Bananas are fruit.’ Stuff like that, where a human knows it, but a computer doesn’t. You’re taking language and turning it into mathematical space. And through that you can find associations that wouldn’t just come out of looking at words as individual items but understanding words as interconnected objects.

“Knowing that something has four legs and fur, and knowing that a dog is an animal, a dog has four legs, and a dog has fur, might help you realize that, from a word you’ve never seen before, that it is an animal. So you can build these associations from common sense. Which is how humans, arguably, come to their own conclusions about things.”

Open-source versus for-profit

Schultz’s truth goggles will be made open-source once finished next year. PolitiFact, of course, is not open-source; it’s a business still trying to figure out how to monetize its data, said editor Bill Adair.

“Whether we’re included or not will be a decision we’ll make down the road,” Adair told me. “I think what he’s going to ultimately come up with is going to benefit all fact-checking news organizations, so I think we’ll be happy to be part of that. The goal is to get more accurate journalism in front of more people….My goal is not to get people to stop lying. I still believe strongly that the role of the journalists is to inform democracy and let people make decisions about their leaders.”

But even the strongest declaration of truth or falsehood can still spark dissent. It’s beyond the scope of his software, but Schultz’s truth goggles software would be stronger if it could draw from multiple sources. There could be specialty fact-checking sources for physics, or psychology. Or maybe Snopes.com could open up its data with an API.

More sources “would help people break away from their filter bubble. They would be exposed to opinions they hadn’t seen before,” Schultz said. “The ultimate goal is to enable intelligent conversations about contentious issues.”

                                   
What to read next
aereo
Mark Coddington    April 12, 2013
Plus: BuzzFeed’s native advertising model, protecting anonymous sources at Fox News, and the rest of the week’s news about the future of news.
  • Bic Sheaffer

    Was there no pepper spray @ Kent State?

  • http://disqus.com Peter Mullen

    Can’t wait to use this on Obama.  It would probably break under the volume of pure bullshit.

  • Eddie

    Politifact in fact has a special Obama section.  Of the 328 statements of his they’ve analyzed, they’ve found 79 were true, 74 were mostly true, 79 half-truths, 51 false, and 4 pants-on-fire.  Compare to Michelle Bachmann, whose numbers were 4, 2, 6, 7, 17, and 10, respectively.  Note that in all cases, there is a bias towards false statements because Politifact tends to analyze controversial statements, not obviously true statements.  Plus, well, they’re politicians, after all.

  • http://disqus.com Peter Mullen

    Comparing Obama to Michele Bachmann.  Now there’s a relevant comparison!   

  • Mikkel Islay

    “I WANDER’D lonely as a cloud
    That floats on high o’er vales and hills,
    When all at once I saw a crowd,
    A host, of golden daffodils;
    Beside the lake, beneath the trees,
    Fluttering and dancing in the breeze.
    Continuous as…..”

    Paperclip: “It looks like you are lying”.
    Paperclip: “The Wild Daffodil (Narcissus pseudonarcissus) is a member of the kingdom Plantae and therefore lacks the physiological adaptions required for dancing”.

    Context is quite important for judging the validity (as opposed to truthfulness) of statements.

  • Vinny

    I’m a bit worried about the term “liar” being used here. 

    If you say something which is factually incorrect, it does not necessarily mean you are a liar. To be a liar, the person would need to know that it was incorrect and say it anyway.   
    So unfortunately to prove Buchmann is a liar, you would need to prove that she had all the correct facts and then ignored them. If she truly believed the claim, this would make her wrong, but not a liar.  

  • http://www.slifty.com Daniel Schultz

    This is definitely true! (although it is worth noting that “liar” wasn’t used here, unless I’m missing it)

    Just a related comment; the point of the project isn’t to disseminate truth or call people out as “liars” — it’s to inspire people to think and then provide information that will help them do that thinking (e.g. leverage the analysis of 3rd parties like Politifact)  In other words, I will be trying very hard to avoid charged language like “lie” as it will quickly turn people off.

  • http://fungibleconvictions.com/ Andrew Whitacre

    Dan, you’ve pitched this primarily as a tool for end-users (i.e., readers). But as Andrew P. points out with the Clippy/spell-check comparison, it could be used for producers as well.

    So how would you go about pitching your software to, say, a newsroom? Have you gotten a sense of how editors and reporters would respond to, “This can help you avoid misleading, view-from-nowhere ‘balance’ in your reporting. Professionally, it helps you and your editor call shenanigans before the story is even published; let’s call it a best practice. Publicly, you know readers really won’t abide b.s. in the name of balance when the facts are literally staring them in the face; call it shaming”?

    Would you expect reservations or, instead, a feeling of “Thank god, now we don’t have to spend time getting the other point of view when that view is verifiably wrong.”

  • Johnnyboycurtis

    Not really, Bachmann (with her far right Christian leading) has the potential to do more harm than good. Obama is just…useless? for either side of the political spectrum

  • Anonymous

    Too too bad.  It has been Politifact needing its ‘facts’ double checked more frequently than before.

  • http://www.facebook.com/people/Ilya-Beraha/1111149888 Ilya Beraha

    It’s fiction(poetry) metaphor embedded text which presupposes “untruth” masks.

  • http://www.goodhelpweb.com Goodhelp

    Lemme see…are you saying Bachmann told the truth 4 or just 2 times? Out of a total of how many analyzed statements? Just asking.

  • http://www.goodhelpweb.com Goodhelp

    In this case then, would the correct descriptive be “unwitting prevaricator?” Ignorant “BS” artist? Untruthful non-liar? Help me here, Vinny!

  • http://www.goodhelpweb.com Goodhelp

    Best pitch: This tool saves you hours of Googling(tm)…the data is ready now!

  • http://www.goodhelpweb.com Goodhelp

    Is it possible nearly 20 Republican candidate debates and a two year presidential campaign cycle has anything to do with generating more facts to be checked more frequently than before?

  • Anonymous

    So you’re suggesting that such (mis)judgements like giving VP Biden a “half-true” because he used statistics reported by the actual city police (Flint, Mi), while Politifact used numbers from the FBI are caused by time constraint?  Overwork?  Too ambitious a scope?

    I think I’ll go with the prevailing opinion; Politifact is coloring its calls to avoid appearing “biased”.  Apparently they can live with “not credible”.

  • Anonymous

    Surely the software could be reduced to something like:

       if (politician==TRUE && lips_moving==TRUE)
       {
          lying = TRUE;
       }

  • http://www.goodhelpweb.com Goodhelp

    The sheer volume of reportage increases the need to fact-check an increased number of alleged facts by any candidate. Increasing the accuracy of Politifact is an added bonus.

    You failed to mention the overall state of journalism today is often that of a stenographer for newsmakers instead of a channel of communication which can ferret out the “real” from the bogus statements. For those who live in a fact-based world, this is dreadful. The fourth estate exists to accurately  inform the public so it can make intelligent judgements and voting decisions.

    Yet, the Murdoch media empire (and others) seek to “dumb-down” the population and repeat “not credible” statements often enough that their viewers/readers believe it.

    Do you think it takes a massive quantitative analysis of all Republican quotes vs all Democratic candidate quotes to determine which end of the spectrum pumps out more “not credible” statements?

  • http://www.lexalexander.net lexalexander

    I’d call it a lie if the candidate 1) knew it was untrue and said it anyway, or 2) said it with reckless disregard for whether or not it was true. This approach would, I hope, reduce the incidence of candidates’ pulling BS out of their rear ends on the spur of the moment, which happens far more often than such intentional untruths as the recent Mitt Romney ad.

  • Al

    It’s Interesting that a “comment” about Obama vs. Bachmann stating that Obama has more “falsehoods” than Bachmann is in itself false. 
    Bachmann: out of 48 statements Bachmann was mostly false, 7x, false 18x, pants on fire 10x. 
    Obama: out of 325 statements Obama was mostly false 41x, false 51x, pants on fire 5x.
    I’m not going to do all the work for you. Go find an eight year old and they’ll show you how to do percentages (%). 
    Also, when it comes to Obama vs Bachmann you need to factor in the number of “opportunities” to state a falsehood. Naturally, just about all of the President’s comments are noted. Michele Bachmann, not so much. Does anyone besides, Michele Bachmann, pay any attention to what comes out of her pie hole? 

  • Bill Williams

    Interesting, so as regards documentations of saying things that were not true, Obama has 79+51+4=134, and Bachmann has 6+7+17=30.

  • http://ducknetweb.blogspot.com/ Medicalquack

    Yes as consumers we do need to learn how to use aggregated data I agree.  There’s way too much out there that has been skewed, spun and just a lot of data bases with inaccurate data and you can’t always believe what you see and read to be gospel, thus I say we need to “Occupy Algorithms” to clean some of this up.  I have blogged about a lot of this in healthcare for the last couple of years.

    http://ducknetweb.blogspot.com/2011/11/occupy-algorithmsthe-attack-of-killer.html

  • BB1

    Yup… or you’d discover that the pillars your reality are denial & delusion. 

  • Anonymous

    What was the spending under republican Bush?

  • Buckaroo

    I do not think that the comparison of Mitt Romney to President Obama would be any more to his liking!

  • http://disqus.com Peter Mullen

    The stakes, however, could not be more different.  Obama’s lies or flip flops are devastating in their significance, even if not covered by a complicit and fawning mainstream media.  Romney is no media darling, but his flip flops pale in comparison.

  • http://www.facebook.com/hans.anderton Hans Anderton

    There are two issues: what the terms denote, and how you assess the accuracy/utility of a set of projections. I’m not remotely expert enough to talk about how the assessment should be done for “truth” projections (or whether how it’s currently being done is adequate);

    I was just pointing out that these terms when used by “experts” have the meanings that “experts”, rather than colloquial usage, define them to have, in the same way that the term “function” means different things to mathematicians, to computer programmers and people in general.

    Holiday Rental Lorgues

  • Uebersetzer

    For all foreign languages i can recommend the following websites:

    http://www.profischnell.com
    http://www.profi-fachuebersetzung.de