Nieman Foundation at Harvard
After criticism over “viewpoint diversity,” NPR adds new layers of editorial oversight
ABOUT                    SUBSCRIBE
Aug. 5, 2010, 1 p.m.

How The Guardian is pioneering data journalism with free tools

The Guardian takes data journalism seriously. They obtain, format, and publish journalistically interesting data sets on their Data Blog, they track transparency initiatives in their searchable index of world government data, and they do original research on data they’ve obtained, such as their amazing in-depth analysis of 90,000 leaked Afghanistan war documents. And they do most of this with simple, free tools.

Data Blog editor Simon Rogers gave me an action-packed interview in The Guardian’s London newsroom, starting with story walkthroughs and ending with a philosophical discussion about the changing role of data in journalism. It’s a must-watch if you’re wondering what the digitization of the world’s facts means for a newsroom. Here’s my take on the highlights; a full transcript is below.

The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which require no special tools for viewing and can be downloaded in just about any desired format. Visualizations are mostly via Many Eyes and Timetric, both free.

Data Blog posts are often related to or supporting of news stories, but not always. Rogers sees the publishing of interesting data as a journalistic act that stands alone, and is clear on where the newsroom adds value:

I think you have to apply journalistic treatment to data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.

The Guardian curates far more data than it creates. Some data sets are generated in-house, such as its yearly executive pay surveys, but more often the data already exists in some form, such as a PDF on a government web site. The Guardian finds such documents, scrapes the data into spreadsheets, cleans it, and adds context in a Data Blog post. But they also maintain an index of world government data which scrapes open government web sites to produce a searchable index of available data sets.

“Helping people find the data, that’s our mission here,” says Rogers. “We want people to come to us when they’re looking for data.”

In alignment with their open strategy, The Guardian encourages re-use and mashups of their data. Readers can submit apps and visualizations that they’ve created, but data has proven to be just as popular with non-developers — regular folks who want the raw information.

Sometimes readers provide additional data or important feedback, typically through the comments on each post. Rogers gives the example of a reader who wrote in to say that the Academy schools listed in his area in a Guardian data set were in wealthy neighborhoods, raising the journalistically interesting question of whether wealthier schools were more likely to take advantage of this charter school-like program. Expanding on this idea, Rogers says,

What used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that.

Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute.

So you can get stories back from them, in a way…If you put the information out there, you always get a return. You get people coming back.

Perhaps surprisingly, data also gets pretty good traffic, with the Data Blog logging a million hits a month during the recent election coverage. “In the firmament of Guardian web sites that’s not bad. That’s kind of upper tier,” says Rogers. “And this is only after being around for a year.” (The even younger Texas Tribune also finds its data pages popular, accounting for a third of total page views.)

Rogers and I also discussed the process of getting useful data out of inept or uncooperative governments, the changing role of data specialists in the newsroom, and how the Guardian tapped its readers to produce the definitive database of Doctor Who villains. Here’s the transcript, lightly edited.

JS: All right. So. I’m here with Simons Rogers in the Guardian newsroom in London, and you’re the editor of the Data Blog.

SR: That’s right, and I’m also a news editor so I work across the organization on data journalism, essentially.

JS: So, first of all, can you tell us what the Data Blog is?

SR: Ok, well basically it came about because, as I said I was a news editor working a lot with graphics, and we realized we were just collecting enormous amounts of data. And we though, well wouldn’t our readers be interested in seeing that? And when the Guardian Open Platform launched, it seemed a good time to think about opening up– we were opening up the Guardian to technical development, so it seemed a good time to open up our data collections as well.

And also it’s the fact that increasingly we’ve found people are after raw information. If you looked– and there’s lots of raw information online, but if you start searching for that information you just get bewildering amounts of replies back. If if you’re looking for, say, carbon emissions, you get millions of entries back. So how do you know what the right set of data is? Whereas we’ve already done that set of work for our readers, because we’ve had to find that data, and we’ve had to choose it, and make an editorial selection about it, I suppose. So we thought we were able to cut out the middle man for people.

But also we kind of thought when we launched it, actually, what we’d be doing is creating data for developers. There seemed to be a lot of developers out there at that point who were interested in raw information, and they would be the people who would use the data blog, and the open platform would get a lot more traffic.

And what actually happened, what’s been interesting about it, is that– what’s actually happened is that it’s been real people who have been using the Data Blog, as much as developers. Probably more so than developers.

JS: What do you mean “real people”?

SR: Real people, I suppose what I mean is that, somebody who’s just interested in finding out what a number is. So for instance, here at the moment we’ve got a big story about a government scheme for building schools, which has just been cut by the new government. It was set up by the old government, who invested millions of pounds into building new school buildings. And so, we’ve got the full list of all the schools, but the parliamentary constituency that they’re in, and where they are and what kind of project they were. And that is really, really popular today, that’s one of our biggest things, because there’s a lot of demonstrations about it, it’s a big issue of the day. And so I would guess that 90% of people looking at it are just people who want to find out what the real raw data is.

And that’s the great thing about the internet, it gives you access to the raw, real information. And I think that’s what people really crave. They want the interpretation and the analysis from people, but they also want the veracity of seeing the real thing, without having it aggregated or put together. They just want to see the raw data.

JS: So you publish all of the original numbers that you get from the government?

SR: Well exactly. The only time– with the Data Blog, I try to make it as newsy as possible. So it’s often hooked around news stories of the day. Partly because it helps the traffic, and you’re kind of hooking on to existing requirements.

Obviously we do– it’s just a really eclectic mix of data. And I can show you the screen, for a sec.

JS: All right. Let’s see something.

SR: Okay, so this is the data blog today. So obviously we’ve got Afghanistan at the top. Afghanistan is often at the top at the moment. This is a full list of everybody who’s died, every British casualty who’s died and been wounded over time. So you’ve got this data here. We use, I tend to use a lot of third party services. This is a company called Timetric, who are very good at visualizing time series data. It takes about five minutes to create that, and you can roll over and get more information.

JS: So is that a free service?

SR: Yeah, absolutely free, you just sign up, and you share it. It works a bit like Many Eyes, you know the IBM service.

JS: Yeah.

SR: We’ll embed these Google docs. We use Google docs, Google spreadsheets to share all our information because it’s very for people to download it. So say you want to download this data. You click on the link, and it will take you through in a second to, there you go, it’s the full Google spreadsheet. And you’ve got everything on here. You’ve got, these are monthly totals, which you can’t get anywhere else, because nobody else does that information.

JS: What do you mean nobody else does it?

SR: Well nobody else bothers to put it together month by month. You can get totals by year from, iCasualties I think do it, but we’ve just collected some month by month, because often we’ve had to draw graphics where it’s month by month. It’s the kind of thing, actually it’s quite interesting to be able to see which month was the worst for casualties.

We’ve got lists of names, which obviously are in a few places. We collect Afghanistan wounded statistics which are terribly confused in the UK, because what they do is they try and make them as complicated as possible. So, the most serious ones, NOTICAS is where your next of kin is notified. That’s a serious event, but also you’ve got all those people evacuated. So anyway, this kind of data. We also keep amputation data, which is a new set that the government refused to release until recently, and a Guardian reporter was instrumental in getting this data released. So we kind thought, maybe we should make this available for people.

So you get all this data, and then what you can do, if you click on “File” there, you can download it as Excel, XML, CSV, or whatever format you want. So that’s why we use Google speadsheets. It’s the kind of thing that’s a very, very easily accessible format for people.

So really what we do is we try and encourage a community, a community to grow up around data and information. So every post has got a talk facility on it.

Anyway, going through it. So this is today’s Data Blog, where you’ve got Afghanistan, Academy schools in the UK. The schools are run by the state, pretty much.

JS: So just to clarify this for the American audience, what’s an Academy school?

SR: Ok, well basically in the UK most schools are state schools, that most children go to. State schools are, we all pay for them, they’re paid for out of our taxes. And they’re run at a local level, which obviously has it’s advantages because it means that you are, kind of, working to an area. What the new government’s proposing to do is allow any school that wants to to become an Academy. And what an Academy is is a school that can run its own finances, and own affairs.

And what we’ve got is we’ve got the data, the government’s published the data — as a PDF of course because governments always publish everything as a PDF, in this country anyway — and what they give you, which we’ve scraped here, is a list of every school in the UK which has expressed an interest. So you’ve got the local authority here, the name of the school, type of school, the address, and the post code. Which is great, because that’s good data, and because it’s on a PDF we can get that into a spreadsheet quite easily.

JS: So did you have to type in all of those things from a PDF, or cut and paste them?

SR: Good god no. No, no, we have, luckily we’ve got a really good editorial support team here, who are, thanks to the Data Blog, are becoming very experienced at getting data off of PDFs. Because every government department would much rather publish something as a PDF, so they can act as if they’re publishing the data but really it’s not open.

JS: So that’s interesting, because in the UK and the US there’s this big government publicity about, you know, we’re publishing all this data.

SR: Absolutely.

JS: But you’re saying that actually–

SR: It’s not 100 percent yet. So, I’ll show you in a second that what they tend to do is just publish– most government departments still want to publish stuff as PDFs. They can’t quite get out of that thing. Or want to say, why would somebody want a spreadsheet? They don’t really get it. A lot of people don’t get it.

And, we wanted the spreadsheet so you can do stuff like this, which is, this is a map of schools interested in becoming Academies by area. And so because we have that raw data in spreadsheet form we can work out how many in the area. You can see suddenly that this part of England, Kent, has 99 schools, which is the biggest in the country. And only one area, which is Barking, up here, in London, which is, sorry, is down here in London, but anyway that has no schools applying at all.

And the government also always said that at the beginning that it would mainly be schools which weren’t “outstanding” would apply. But actually if you look at the figures, which again, we can do, the majority of them are outstanding schools. So they’re already schools which are good, which are applying to become academies. Which kind of isn’t the point. But that kind of analysis, that’s data journalism in a sense. It’s using the numbers to get a story, and to tell a story.

JS: And how long did that story take you to put together? To get the numbers, and do the graphics, and…?

SR: Well, I was helped a bit, because I got, I’ve had one of my helpers who works in editorial support to get the data onto a spreadsheet. And in terms of creating the graphic we have a fantastic tool here, which is set up by one of our technical development team who are over there, and what it does, is it allows you to paste a load of data, geographic data, into this box, and you tell it what kind, is it parliamentary constituency, or local authority, or educational authority, or whatever, however the different regional differentiations we have in the UK, and it will draw a map for you. So this map here was drawn by computer, basically, and then one of the graphics guys help sort out the labels and finesse it and make it look beautiful. But it saves you the hard work of coloring up all those things. So actually that took me maybe a couple of hours. In total.

JS: How about getting the data, how long did that take?

SR: Oh well luckily that data– you know the government makes the data available. But like I say, as a PDF file. So this is the government site, and that’s the list there, and you open it, it opens as a PDF. Because we’ll link to that.

But luckily the guys in the ESD [editorial services department] are very adept now, because of the Data Blog, at getting data into spreadsheets. So, you know they can do that in 20 minutes.

JS: So how many people are working on data overall, then?

SR: Well, in terms of– it’s my full time job to do it. I’m lucky in that I’ve got an awful lot of people around here who have got an interest who I can kind of go and nudge, and ask. It’s a very informal basis, and we’re looking to formalize that, at the moment. We’re working on a whole data strategy, and where it goes. So we’re hoping to kind of make all of these arrangements a bit more formal. But at the moment I have to fit into what other people are doing. But yeah, we’ve got a good team now that can help, and that’s really a unique thing.

So I was going through the Data Blog for you. So this is a typical, a weird day, so schools, and then we’ve got another schools thing because it’s a big schools day today. This is school building projects scrapped by constituency, full list. Now, this is another where the government didn’t make the data easily available. The department for education published a list of all the school projects that were going to be stopped when the government cut the funding, some of which is going towards creating Academy schools, which is why this is a bit of an issue in the country at the moment. And we want to know by constituency how it was working. So which MPs were having the most school projects cut, in their constituency. And we couldn’t get that list out of the department of education, but one MP had lodged it with the House of Commons library. So we managed to get it from the House of Commons library. But it didn’t come in a good form, it came in a PDF again, so again we had to get someone from tech to sort it out for us.

But the great thing is that we can do something like this, which is a map of projects stopped by constituency, by MP. And most of the projects we’ve stopped were in Labour seats. As you know Labour are not in power at the moment. So we can do some of this sort of analysis which is great. So there were 418 projects stopped in Labour constituent seats, and 268 stopped in conservative seats. So basically 40% of Labour MPs had a project stopped, at least one project stopped in their seat, compared to only 27% of Conservatives, and 24% of the Dems who are in power at the moment.

JS: So would it be accurate to say the data drove this story, or showed this story, or…?

SR: Data showed this story, which is great, but the one thing, the caveat — of course, the raw numbers are never 100% — the caveat was there were more projects going on in Labour areas because Labour government, previous government which is Labour set up the projects, and they gave more projects to Labour areas. So you can read it either way.

JS: And you said this in the story?

SR: We said this in the story. Absolutely. We always try and make the caveats available for people. So that’s a big story today, because of there are demonstrations about it in London. You’ve come to us on a very education-centered day today.

But there’s other stuff on the blog too. This is a very British thing. We did this because we thought it would be an interesting project to do. I had somebody in for a week and they didn’t have much to do so I got them to make a list of every Doctor Who villain ever.

JS: This was an intern project?

SR: This was an intern project. We kinda thought, yeah, we’ll get a bit of traffic. And we’ve never had so much involvement in a single piece ever. It’s had 500 retweets, and when you think most pieces will get 30 or 40, it’s kind of interesting. The traffic has been through the roof. And the great thing is, so we created–

JS: Ooh, what’s this? This is good.

SR: It’s quite an easy– we use ManyEyes quite a lot, which is very very quick to create lovely little graphics. And this is every single Doctor Who villain since the start of the program, and how many times they appear. So you see the Daleks lead the way in Doctor Who.

JS: Yeah, absolutely.

SR: Followed by the Cybermen, and the Masters in there a lot. And there are lots of other little things. But we started off with about 106 villains in total, and now we’re up to– we put it out there and we said to people, we know this isn’t going to be the complete list, can you help us? And now we’ve got 212. So my weekend has basically been– I’ll show you the data sheet, it’s amazing. You can see the comments are incredible. You see these kinds of things, “so what about the Sea Devils? The Zygons?” and so on.

And I’ll show you the data set, because it’s quite interesting. So this is the data set. Again Google docs. And you can see over here on the right hand side, this is how many people looking at it at any one time. So at that moment there are 11 people looking on. There could be 40 or 50 people looking at any one moment. And they’re looking and they’re helping us make corrections.

JS: So, wait– this data set is editable?

SR: No, we haven’t made it editable, because we’ve had a bad experience people coming to editable ones and mucking around, you know, putting swear words on stuff.

JS: So how do they help you?

SR: Well they’ll put stuff in the comments field and I’ll go in and put it on the spreadsheet. Because I want a sheet that people can still download. So now we’ve got, we’re now up to 203. We’ve doubled the amount of villains thanks to our readers. It’s Doctor Who. And it just shows we’re an eclectic– we’re a broad church on the Data Blog. Everything can be data. And that’s data. We’ve got number of appearances per villain, and it’s a program that people really care about. And it’s about as British as it’s possible to get. But then we also have other stuff too– and there we go, crashed again.

JS: Well let me just ask you a few questions, and take this opportunity to ask you some broader questions. Because we can do this all day. And I have. I’ve spent hours on your data blog because I’m a data geek. But let’s sort of bring it to some general questions here.

SR: Okay. Go for it.

JS: So first of all, I notice you have the Data Blog, you also have the world data index.

SR: Yes. Now the idea of that was that, obviously lots of governments around the world have started to open up their data. And around the time that the British government was– a lot of developers here were involved in that project — we started to think, what can we do around this that would help people, because suddenly we’ve got lots of sites out there that are offering open government data. And we thought, what if we could just gather them all together into one place. So you’ve got a single search engine. And that’s how we set up the world data search. Sorry to point you at the screen again.

JS: No that’s fine, that’s fine.

SR: Basically, so what we did, we started off with just Australia, New Zealand, UK and America. And basically what this site does, is it searches all of these open government data sites. Now we’ve got Australia, Toronto in Canada, New Zealand, the UK, London, California, San Francisco, and

So say you search for “crime,” say you’re interested in crime. There you go. So you come back here, you see you’ve got results here from the UK, London, you’ve got results from in America, San Francisco, New Zealand and Australia. Say you’re interested in just seeing– you live in San Francisco and you’re only interested in San Francisco results. You’ve three results. And there you go, you click on that.

And you’re still within the Guardian site because what we’re asking people to do is help us rank the data, and submit visualizations and applications. So we want people to tell us what they’ve done with the data.

But anyway if you go and click on that, and you click on “download,” and it will start downloading the data for you. Or, what it will do is take you to the terms and conditions. We don’t bypass any T&Cs. The T&C’s come alongside. But you click on that, you agree to that, and then you get the data. So we really try and make it easy for people. There you go. And this is the crime incidence data. Very variable. This is great because it’s KML files, so if you wanted to visualize that you get really great information. It’s all sorts of stuff. Sometimes it’s CSVs.

JS: What’s a KML file?

SR: So, Google Earth.

JS: Okay.

SR: Sorry. So, it’s mapping, a mapping file straight away.

SR: Okay, so one of the things we ask people to do is to submit visualizations and applications they’ve produced. So for instance, London has some very very good open data. If you haven’t looked around the Data Store, it’s really worth going to. And one of these things they do is they provide a live feed of all the London traffic cameras. You can watch them live. And this is a lovely thing, because what somebody’s done is they’ve written an iPad application. So you can watch live TFL, Transport for London, traffic cameras on your iPad.

And you see that data set has been rated. A couple of people have gone in there and rated it. You’ve got a download button, the download is XML. So we try and help people around this data. And this is growing now. Every time somebody launches an open government data site we’re gonna put it on here, and we’re working on a few more at the moment. So we want it to be the place that people go to. Every time you Google “world government data” it pops up at the top, which is what you want. You want people who are just trying to compare different countries and don’t know where to start, to help them find a way through this maze of information that’s out there.

JS: So do you intend to do this for every country in the world?

SR: Every country in the world that launches an open government data site, we’ll whack it on here. And we’re working– at the moment there are about 20 decent open government data sites around the world. We’re picking those up. We’ve got on here now, how many have we got? One, two, three, four, five, six, seven, eight. We’ll have 20 on in the next couple of weeks. We’re really working through them at the moment.

And what this does is, it scrapes them. So basically, we don’t– for us it’s easy to manage because we don’t have to update these data sets all the time. The computer does that for us. But basically, what we do provide people with is context and background information, because you’re part of the data site there.

JS: So let me make sure I have this clear. So you’re not sucking down the actual data, you’re sucking down the list and descriptions of the data sets available?

SR: Absolutely. So we’re providing people, because basically we want it to be as updated as possible. We don’t– if we just uploaded onto our site, that would kind of be pointless, and it would mean it would be out of date. This way, if something pops up on and stays there, we’ll get it quick on here. We’ll help people find it. Helping people find the data, that’s our mission here. It’s not just generating traffic, it’s to help people find the information, because we want people to come to us when they’re looking for data.

JS: So, okay. You’ve talked about, it sounds like, two different projects. The Data Blog. where you collect and clean up and present data that you–

SR: That we find interesting. We’re selective.

JS: In the process of the Guardian’s newsgathering.

SR: Yeah, and just things that are interesting anyway. So the Doctor Who post that we were just looking at is just interesting to do. It’s not anything we’re going to do a story about. And often they’ll be things that are in the news, say that day, and I’ll think “oh that’s a good thing to put on the Data Blog.” So it could be crime figures, or it could be– and sometimes, the side effect of that is a great side effect because you end up with a piece in the paper, or a piece on the web site. But often it might be the Data Blog is the only place to get that information.

JS: And you index world government data sites.

SR: Yeah, absolutely.

JS: Does the Guardian do anything else with data?

SR: Yeah, well what we do is, we’re doing a lot of Guardian research with data. So what we want to do is give people a kind of way into that. So for instance, we do do a lot of data-based projects. So for instance we’re doing an executive pay survey of all the biggest companies, how much they pay their bosses and their chief executives. That has always been a thing the paper’s always done for stories. And now what we’ll do is we’ll make that stuff available– that data available for people. So instead of just raw data journalism, it’s quite old data journalism. We’ve been doing it for ten years. But we used to just call it a survey. Now it’s data journalism, because it’s getting stories out of numbers. So we’ll work with that, and we’ll publish that information for people to see. And there are a couple of big projects coming up this week, which I really can’t tell you about, but next week it will be obvious what they are.

JS: Probably by the time this goes up we’ll be able to link to them.

[Simon was referring to the Guardian’s data journalism work on the leaked Afghanistan war logs, described in a thorough post on the Data Blog.]

SR: Yeah, I’ll mail you about them. But we’ve got now an area of expertise. So increasingly what I’m finding is that I’m getting people coming to me within The Guardian, saying, so we’ve got this spreadsheet, well how can I do this? So for instance that Academies thing we were just looking at, we were really keen to find out which areas were the most, where the most schools were, for the paper. The correspondent wanted to know that. So actually, because we’ve got this area of expertise now in managing data, we’re becoming kind of a go-to place within The Guardian, for journalists who are just writing stories where they need to know something, or they need to find some information out, which is an interesting side effect. Because it used to be that journalists were kind of scared of numbers, and scared of data. I really think that was the case. And now, increasingly, they’re trying to embrace that, and starting to realize you can get stories out of it.

JS: Well that’s really interesting. Let’s talk for a minute about how this applies to other newsrooms, because it’s– as you say, journalists have been traditionally scared of data.

SR: Yeah, absolutely. You could say they prided themselves, in this country anyway, they prided themselves on lack of mathematical ability. I would say.

JS: Which seems unfortunate in this era.

SR: Yeah, absolutely. Yeah, yeah, absolutely.

JS: But especially a lot of our readers are from smaller newsrooms, and so what kind of technical capability do you need to start tracking data, and publishing data sets?

SR: I think it’s really minimal. I mean, the thing is that actually, what we’re doing is really working with a basic, most of the time just basic spreadsheet packages. Excel or whatever you’ve got. Excel is easy to use, but it could be any package really. And we’re using Google spreadsheets, which again is widely available for people to do information. We’re using visualization tools which are again, ManyEyes or Timetric which are widely available and easy to use. I think what we’re doing is just bringing it together.

I think traditionally that journalists wouldn’t regard data journalism as journalism. It was research. Or, you know, how is publishing data– is that journalism? But I think now, what is happening is that actually, what used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that. Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute. So you can get stories back from them, in a way. So we’re receiving the information much more.

JS: So you publish the data, and then other people build stories out of it, is that what you’re saying?

SR: Other people will let us know– well, we publish say, well that’s an interesting story, or this is a good visualization. We’ve published data for other people to visualize. We thought, that’s quite an interesting thing to mash it up with, we should do that ourselves. So there’s that thing, and there’s also the fact that if you put the information out there, you always get a return. You get people coming back.

So for instance the Academies thing today that we were talking about. We’ve had people come back saying, well I live in Derbyshire and I know that those schools are in quite wealthy areas. So we start to think, well is there a trend towards schools in wealthy areas going to this, and schools in poorer areas not going to this.

So it gives you extra stories or extra angles on stories you wouldn’t think of. And I think that’s part of it. And I think partly there’s just the realization that just publishing data in itself, because it’s interesting, is a journalistic enterprise. Because I think you have to apply journalistic treatment to that data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.

JS: So last question here, which is of course going to be on many editors’ and publishers’ minds.

SR: Sure.

JS: Let’s talk about traffic and money. How does this contribute to the business of The Guardian?

SR: Okay, it’s a new– it’s an experiment for us, but traffic-wise it’s been pretty healthy. We’ve had– during the election we were getting a million page impressions in a month. Which is not bad. On the Data Blog. Now, as a whole, out of the 36 million that The Guardian gets, it doesn’t seem like a lot. But actually, in the firmament of Guardian web sites that’s not bad. That’s kind of upper tier. And this is only after being around for a year.

So in terms of what it gives us, it gives the same as producing anything that produces traffic gives us. It’s good for the brand, and it’s good for The Guardian site. In the long run, I think that there is probably canny money to be made out of there, for organizations that can manage and interpret data. I don’t know exactly how, but I think we’d have to be pretty dumb if we don’t come up with something. I’d be very surprised. It’s an area where there’s such a lot of potential. There are people who don’t really know how to manage data and don’t really know how to organize data that– for us to get involved in that area. I really think that.

But also I think that just journalistically, it’s as important to do this as it is to write a piece about a fashion week or anything else we might employ a journalist to do. And in a way it’s more important, because if The Guardian is about open information, which– since the beginning of The Guardian we’ve campaigned for freedom of information and access to information, and this is the ultimate expression of that.

And we, on the site, we use the phrase “facts are sacred.” And this comes from the famous C. P. Scott who said that “comment is free,” which as you know is the name of our comment site, but “facts are sacred” was the second part of the saying. And I kinda think that is– you can see it on the comment site, there you go. “Comment is free, but facts are sacred.” And that’s what The Guardian’s about. I really think that, you know, this says a lot about the web. Interestingly, I think that’s how the web is changing, in the sense that a few years ago it was just about comment. People wanted to say what they thought. Now I think it’s, increasingly, people want to find out what the facts are.

JS: All right, well, thank you very much for a thorough introduction to The Guardian’s data work.

SR: Thanks a lot.

POSTED     Aug. 5, 2010, 1 p.m.
Show tags
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
After criticism over “viewpoint diversity,” NPR adds new layers of editorial oversight
“We will all have to adjust to a new workflow. If it is a bottleneck, it will be a failure.”
“Impossible to approach the reporting the way I normally would”: How Rachel Aviv wrote that New Yorker story on Lucy Letby
“So much of the media coverage — and the trial itself — started at the point at which we’ve determined that [Lucy] Letby is an evil murderer; all her texts, notes, and movements are then viewed through that lens.”
Increasingly stress-inducing subject lines helped The Intercept surpass its fundraising goal
“We feel like we really owe it to our readers to be honest about the stakes and to let them know that we truly cannot do this work without them.”