Want to know how many times Beyoncé has visited the White House?
How many bills has Rep. Darrell Issa sponsored?
How many Dominicans live in New York City?
The idea of “a Siri or Wolfram Alpha for government data” — something that can connect natural language queries with multfaceted datasets — had been kicking around in the mind of MIT Media Lab and Knight-Mozilla veteran Dan Schultz ever since a Knight Foundation-sponsored election-year brainstorming session in 2011. But CivOmega, a new data-mining tool designed to answer questions about government and civic life, only became a reality after this year’s Knight-Mozilla OpenNews Hack Day late last month.
The Hack Day was held in conjunction with the Knight-MIT Civic Media Conference last month (which we summarized here). This year’s theme was Insiders/Outsiders, so the focus of the hack day was making data more accessible to “outsiders” — those who don’t have the luxury of time or advanced coding skills to parse enormous caches of data. After all, as Schultz wrote in a blog post introducing CivOmega: “If nerds and people who have too much time on their hands are the only ones who can use government data then it won’t change the world. Plus, why should those people get to decide what is and isn’t important?”
Schultz teamed up with Knight Foundation’s Chistopher Sopher, Knight Lab’s Joe Germuska, Knight-Mozilla fellows Mike Tigas, Manuel Aristaran, and Friedrich Lindenberg, The Texas Tribune’s Travis Swicegood, and more (all collaborators are listed here) to code the project. What they ended up with was a prototype for a civic data search engine with the potential to make huge swathes of government data legible to average citizens. Users type a question into CivOmega in plain English and receive a number of possible answers, complete with links to sources and information about the calculations.
“If you go to Data.gov and browse data, or Census.gov even, it’s just a bunch of numbers. A lot of times, open data is just kind of, ‘Here’s a mass of data without context,'” Tigas told me. “So by providing ways for people to find the very local impact of certain data sets or bits of information — like ‘How many people live around me? How many car accidents happened in my neighborhood?’ — CivOmega and tools like this, they kind of help bridge that gap between being completely transparent, where people can understand data, and the umbrella of open data.”
Using some ready-made APIs from the Sunlight Foundation (creators of CapitolWords and Congress for iOS) and Germuska’s Census Reporter, the hack day team built CivOmega to be “super-modular,” as Tigas said, so individual hackers or journalists would be able to plug in their own APIs and create parsers. The source code, along with a simple how-to guide to using (and contributing to) CivOmega, is available on GitHub.
“The whole thing right now is open source, so there’s already a few modules that we have built that show you, if you have an API that can take text searches, and if you can think of ways to craft questions around your API that people in theory would ask, you can basically use our code as a starting point to bolt your API into CivOmega, and use this as kind of like a search engine for free,” Tigas said.
“The vision would be to provide a toolkit that would allow any API — anyone who’s passionate about their data set or about their API, they could use some tools that we provide that make it really easy to map their API to the questions,” Schultz said.
CivOmega’s potential is especially exciting in light of an executive order President Obama issued in May, which mandates that “the default state of new and modernized Government information resources shall be open and machine readable.” The move, as Alex Howard noted in Slate, is a really big deal when it comes to leveraging data for innovation and efficiency — and means that there will be much more information for tools like CivOmega to make accessible to the public.
“The point is, the APIs are going to be out there — whether they’re going to be provided directly by the government or the Knight Foundations of the world or the individuals and organizations who care about it,” said Schultz.
Last week, Obama met with senior officials and cabinet members on the status of his open government initiatives in a presentation at the White House in which he emphasized the impact of the executive order and the Presidential Innovation Fellows program.
Open data projects, like the Department of Better Technology’s Screendoor and Knight News Challenge winner Outline, a real-time simulator of the financial impacts of public policy, were among the highlights of the Knight-MIT Civic Media Conference.
But the fact that data is open doesn’t mean it gets used — and that means it’s important that tools like CivOmega make it beyond the prototype stage. (The project’s GitHub repo was last updated, at this writing, on June 27.)
“Right now, it kind of works like a Mad Lib: We use exact sentences, and you kind of replace the nouns or whatever the subject is that you want to find data on: ‘How many X live in New York,’ where we know X is some demographic slice,” Tigas said. “The thing we really want to work on, other than just adding more data sources, is to get in touch with people who do natural language processing, which is how you’d be able to ask more broad questions rather than just filling in the blanks.”
“There’s a lot to be improved going forward, in terms of the patterns that it’s able to detect and how it’s able to detect those patterns, and increasing the scope of the kinds of data sets its using,” Schultz said. “Going forward, the vision would be to basically have a way of developing a more robust way of matching questions to answers. Once we’ve done that, then we can start to do what Wolfram Alpha has done — it’s smart enough to connect it to different answer that might not be answers to the questions you’re asking.”
CivOmega has a better chance of a long and useful life than some hack day projects because it was built to be scalable: Individual developers, not CivOmega’s creators, are responsible for making sure their modules work. “The important thing from our perspective is going to be to make sure that there is a good way of alerting developers that their module isn’t working anymore,” said Schultz. “In other words, it’s sort of a decentralized maintenance proposition.
“The point of CivOmega is to make this government data more useful,” he said. “It’s not like the existence of CivOmega will make that happen — what we need to do is make it really easy for a developer to spend not very much time bridging that gap between an API and the human question.”
Photo by justgrimes used under a Creative Commons license.