Nieman Foundation at Harvard
HOME
          
LATEST STORY
Journalism scholars want to make journalism better. They’re not quite sure how.
ABOUT                    SUBSCRIBE
Dec. 18, 2012, 2:53 p.m.

How does Wikipedia deal with a mass shooting? A frenzied start gives way to a few core editors

A researcher into editing patterns on Wikipedia compares the Connecticut school shooting to other acts of mass violence to see how coverage of breaking news gets done on the site.

If you follow me on Twitter, you’re probably already well acquainted with my views on what should happen in the wake of the shooting spree that massacred 20 children and 6 educators at a suburban elementary school in Newtown, Connecticut. This post, however, will build on my previous analysis of the Wikipedia article about the Aurora shootings, as well as my dissertation examining Wikipedia’s coverage of breaking news events, to compare the evolution of the article for the Sandy Hook Elementary School shooting to other Wikipedia articles about recent mass shootings.

In particular, this post compares the behavior of editors during the first 48 hours of each article’s history. The fact that there are 43 English Wikipedia articles about shootings sprees in the United States since 2007 should lend some weight to this much ballyhooed “national conversation” we are supposedly going to have, but I choose to compare just six of these articles to the Sandy Hook shooting article based on their recency and severity as well as an international example.

Wikipedia articles certainly do not break the news of the events themselves, but the first edits to these article happen within two to three hours of the event itself unfolding. However, once created these articles attract many editors and changes as well as grow extremely rapidly.

Figure 1: Number of changes made over time.

The Virginia Tech article, by far and away, attracted more revisions than the other shootings in the same span of time and ultimately enough revisions in the first 48 hours (5,025) to put in within striking distance of the top 1,000 most-edited articles in all of Wikipedia’s history. Conversely, the Oak Creek and Binghamton shootings, despite having 21 fatalities between them, attracted substantially less attention from Wikipedians and the news media in general, likely because these massacres had fewer victims and the victims were predominantly non-white.

A similar pattern of VT as an exemplary case, shootings involving immigrants and minorities attracting less attention, and the other shootings having largely similar behavior is also found in the the number of unique users editing an article over time:

Figure 2: Number of unique users over time.

These editors and the revisions they make cause articles to rapidly increase in size. Keep in mind, the average Wikipedia article’s length (albeit highly skewed by many short articles about things like minor towns, bands, and species) is around 3,000 bytes and articles above 50,000 bytes can raise concerns about length. Despite the constant back-and-forth of users adding and copyediting content, the Newtown article reached 50kB within 24 hours of its creation. However, in the absence of substantive information about the event, much of this early content is often related to national and international reactions and expressions of support. As more background and context as information comes to light, this list of reactions is typically removed which can be seen in the sudden contraction of article size as seen in Utøya around 22 hours, and Newtown and Virginia Tech around 36 hours. As before, the articles about the shootings at Oak Creek and Binghamton are significantly shorter.

Figure 3: Article size over time.

However, not every editor does the same amount of work. The Gini coefficient captures the concentration of effort (in this case, number of revisions made) across all editors contributing to the article. A Gini coefficient of 1 indicates that all the activity is concentrated in a single editor while a coefficient of 0 indicates that every editor does exactly the same amount of work.

Figure 4: Gini coefficient of editors’ revision counts over time.

Across all the articles, the edits over the first few hours are evenly distributed: editors make a single contribution and others immediately jump in to also make single contributions as well. However, around hour 3 or 4, one or more dedicated editors show up and begin to take a vested interest in the article, which is manifest in the rapid centralization of the article. This centralization increases slightly over time across all articles suggesting these dedicated editors continue to edit after other editors move on.

Another way to capture the intensity of activity on these articles is to examine the amount of time elapsed between consecutive edits. Intensely edited articles may have only seconds between successive revisions while less intensely edited articles can go minutes or hours. This data is highly noisy and bursty, so the plot below is smoothed over a rolling average of about 3 hours.

Figure 5: Waiting times between edits (y-axis is log-scaled).

What’s remarkable is the sustained level of intensity over a two day period of time. The Virginia Tech article was still being edited several times every minute even 36 hours after the event while other articles were seeing updates every five minutes more than a day after the event. This means that even at 3 am, all these articles are still being updated every few minutes by someone somewhere. There’s a general trend upward reflecting the initially intense activity immediately after the article is created following increasing time lags as the article stabilizes, but there’s also a diurnal cycle with edits slowing between 18 to 24 hours after the event, before quickening again. This slowing and quickening is seen around about 20 hours as well as around 44 hours suggesting information being released and incorporated in cycles as the investigation proceeds.

Finally, who is doing the work across all these articles? The structural patterns of users contributing to articles also reveals interesting patterns. It appears that much of the editing is done by users who have never contributed to the other articles examined here, but there are a few editors who contributed to each of these articles within 4 hours of their creation.

Figure 6: Collaboration network of articles (red) and the editors who contribute to them (grey) within the first four hours of their existence. Editors who’ve made more revisions to an article have thicker and darker lines.

Users like BabbaQ (edits to Sandy Hook), Ser Amantio di Nicolao (edits to Sandy Hook), Art LaPella (edits to Sandy Hook) were among the first responders to edit several of these articles, including Sandy Hook. However, their revisions are relatively minor copyedits and reference formatting reflecting the prolific work they do patrolling recent changes. Much of the substantive content of the article is from editors who have edited none of the other articles about shootings examined here and likely no other articles about other shootings. In all likelihood, readers of these breaking news articles are mostly consuming the work of editors who have never previously worked on this kind of event. In other words, some of the earliest and most widely read information about breaking news events is written by people with fewer journalistic qualifications than Medill freshmen.

What does the collaboration network look like after 48 hours?

Figure 7: Collaboration network after 48 hours.

3,262 unique users edited one or more of these seven articles, 222 edited two or more of these articles, 60 had edited 3 or more, and a single user WWGB had edited all seven within the first 48 hours of their creation. These editors are at the center of Figure 7 where they connect to many of the articles on the periphery. The stars surrounding each of the articles are the editors who contributed to that article and that article alone (in this corpus). WWGB is an editor who appears to specialize not only in editing articles about current events, but participating in a community of editors engaged in the newswork on Wikipedia. These editors are not the first to respond (as above), but their work involves documenting administrative pages enumerating current events and mediating discussions across disparate current events articles. The ability for these collaborations to unfold as smoothly as they do appears to rest on the ability for Wikipedia editors with newswork experience to either supplant or compliment the work done by amateurs who first arrive on the page.

Of course, this just scratches the surface of the types of analyses that could done on this data. One might look at changes in the structure and pageview activity of each article’s local hyperlink neighborhood to see what related articles are attracting attention, examine the content of the article for changes in sentiment, the patterns of introducing and removing controversial content and unsubstantiated rumors, or broaden the analysis to the other shooting articles. Needless to say, one hopes the cases for future analyses become increasingly scarce.

Brian Keegan is a post-doctoral research fellow in computational social science at Northeastern University. He earned his Ph.D. in the Media, Technology, and Society program at Northwestern University’s School of Communication, where his dissertation examined the dynamic networks and novel roles which support Wikipedia’s rapid coverage of breaking news events like natural disasters, technological catastrophes, and political upheaval. This article originally appeared on his website.

POSTED     Dec. 18, 2012, 2:53 p.m.
Show tags
 
Join the 60,000 who get the freshest future-of-journalism news in our daily email.
Journalism scholars want to make journalism better. They’re not quite sure how.
Does any of this work actually matter?
Congress fights to keep AM radio in cars
The AM Radio for Every Vehicle Act is being deliberated in both houses of Congress.
Going back to the well: CNN.com, the most popular news site in the U.S., is putting up a paywall
It has a much better chance of success than CNN+ ever did. But it still has to convince people its work is distinctive enough to break out the credit card.