What’s New in Digital and Social Media Research: How editors see the news differently from readers, and the limits of filter bubbles

Predicting what goes viral, sourcing the Arab Spring, and Twitter in power vs. out of power: all that and more in this month’s roundup of the academic literature.

By John Wihbey @wihbey March 31, 2014, 1 p.m.

Editor’s note: There’s a lot of interesting academic research going on in digital media — but who has time to sift through all those journals and papers?

Our friends at Journalist’s Resource, that’s who. JR is a project of the Shorenstein Center on Media, Politics and Public Policy at the Harvard Kennedy School, and they spend their time examining the new academic literature in media, social science, and other fields, summarizing the high points and giving you a point of entry. Here, John Wihbey sums up the top papers in digital media and journalism this month.

New technology, new money, new newsrooms, old questions: The State of the News Media in 2014

March 26, 2014

Recent weeks have brought a deluge of new findings about the digital media space, crowned by the Pew Research Journalism Project’s 2014 State of the News Media report. (Here’s the Nieman Lab summary.)

The unfaithful audience: How topics, devices, and urgency affect the way we get our news

March 18, 2014

The American Press Institute also issued an important new report, “The Personal News Cycle,” which finds that demographics matter less in terms of news-seeking and “that some long-held beliefs about people relying on just a few primary sources for news are now obsolete.” (See the Lab’s writeup.) Columbia Journalism School’s Tow Center for Digital Journalism also released a new report by Nicholas Diakopoulos, “Algorithmic Accountability Reporting: On the Investigation of Black Boxes,” as well as findings from Anna Hiatt’s “Future of Digital Longform Project.” But the insights don’t end there. From gatekeeping debates to filter bubbles to viral content, answers are flowing in from many corners of the research world, as you’ll see below.

“Shares, Pins, and Tweets: News Readership From Daily Papers to Social Media”: From Duke University, published in Journalism Studies. By Marco Toledo Bastos.

Drawing two weeks of data from The New York Times’ and The Guardian’s APIs, as well as the APIs of various social media platforms, Bastos set out to answer an important question: How much overlap is there between what editors choose to focus on and what social media users grab on to? This revitalizes an old debate over editorial judgment, gatekeeping, and norms of “newsworthiness.” The author looks about 16,000 articles on the news sites from late 2012; he also analyzes article links circulated on Facebook, Twitter, Pinterest, Google+, Delicious, and StumbleUpon. Some raw data findings relating to links posted on social media prove interesting: Times articles earned an average of 39 retweets on Twitter and 445 shares on Facebook, while Guardian articles saw an average of 50 retweets and 190 Facebook shares.

Bastos concludes: “The results show that social media users express a preference for a subset of content and information that is at odds with the decisions of newspaper editors regarding which topic to emphasize.” Social media users tend to favor hard news over soft news, especially on Twitter. Only a quarter of the Times sports articles studied, for example, ever showed up on Twitter or Facebook. Likewise, news editors’ preferences for more articles about the economy do not track with social media user’s apparent preferences. Further, Bastos says, “although most news sections are uniformly and symmetrically distributed across newspapers and social networking sites, we found remarkable differences on the number of news items about arts, science, technology, and opinion pieces, which are on average more frequent on social networking sites than on newspapers.” The variation may be partly explained by the more urban, educated and youthful characteristics of social media users, the study notes.

“Ideological Segregation and the Effects of Social Media on News Consumption”: From Carnegie Mellon University and Microsoft Research. By Seth Flaxman, Sharad Goel, and Justin M. Rao.

A noteworthy contribution to the “filter bubble” debate, the research — one of the “largest studies of online news consumption to date” — suggests that overall ideological segregation because of online media channels and personalization is rather limited. Flaxman, Goel, and Rao analyze a dataset of the nearly complete (and anonymized) web browsing habits of 1.2 million Internet users over a three month period, some 2.3 billion pageviews. The researchers use machine learning tools to assess ideology of the persons studied, looking at county-level voting patterns and demographics.

For descriptive news articles accessed through social media, the level of ideological segregation is “marginally higher” than for those read by visiting a news site directly. The pattern is “more pronounced” for opinion pieces, and there is a higher degree of segregation in web search, roughly the “ideological distance between the centrist Yahoo! News and the left leaning Huffington Post (or equivalently, CNN and the right-leaning National Review).” Flaxman, Goel, and Rao conclude that a “relatively small amount of online news consumption is driven by the polarizing social and search channels, and opinion pieces which are typically the focus of laboratory studies constitute just 6% of articles relating to world or national news…[W]e find that individuals typically consume descriptive reporting, and do so by directly visiting a handful of their preferred news outlets.” Thus, while it is true social channels and search lead to segregation and filter bubbles, people are not primarily getting their news through those channels, and the “overall impact of these factors appears to be limited at this time.”

Related: For a more precise, quantitative sense of how much Google actually filters results, see “Personalization of Web Search,” a 2013 paper by a team at Northeastern University. That study finds that on average “11.7% of search results show differences due to personalization.”

“Can Cascades be Predicted?”: From Facebook, Stanford University, and Cornell University. By Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon Kleinberg, and Jure Leskovec.

A group of in-house data scientists at Facebook and select academic partners are increasingly sharing publicly some insights from the holy grail of network data. Hence this paper, which looks at how information cascades unfold and whether they can be predicted. It analyzes about 151,000 photos uploaded to Facebook and shared 9.2 million times over in June 2013. The network scientists try to figure out if it is possible to predict a viral cascade (multiple generations of peer-to-peer sharing, originating from a single “seed” or node). They work backwards, doing detective work to try to pick out viral signatures. Tentative findings include: Viral cascades typically start fast; early reproduction speed, the initial velocity across the network, seems to be a key marker. Further, as the photo spreads, it begins to matter less (the variable diminishes in importance) who spread it originally, and the actual kind of content matters less — though captions do seem to matter. Having a broader first generation of resharers also counts, too. Because the researchers could see the same photos being uploaded by different persons, they also noted that the first times the photo was uploaded, it was more likely to go viral compared to the later instances of uploading.

Related: Another hugely important paper in this area is “The Structural Virality of Online Diffusion,” by Sharad Goel, Jake Hofman, and Duncan Watts of Microsoft Research and Ashton Anderson of Stanford. They analyze a billion links (news, images, videos, petitions) shared on Twitter. One of out every 3,000 links produced a “large event,” or a sharing phenomenon that reached 100 additional persons beyond the seed node; but truly viral events (many multiple generations of sharing, several thousand adoptions at least) occured only about once in a million instances. The researchers finally define what it is to be a viral event: There is an average of at least 10 nodes between any points on the entire network graph, suggesting the content has genuinely travelled far by virtue of grassroots peer-to-peer sharing, not just a big broadcast.

“Social, Search and Direct: Pathways to Digital News”: From the Pew Research Journalism Project. By Amy Mitchell, Mark Jurkowitz, and Kenneth Olmstead.

A hugely insightful new report, the data that Mitchell, Jurkowitz, and Olmstead analyze suggest that those accessing media through social media channels do not spend much time with news content, and news is consumed mostly “incidentally” on social platforms. Some of their salient conclusions are: “among users coming to these news sites through a desktop or laptop computer, direct visitors spend, on average, 4 minutes and 36 seconds per visit. That is roughly three times as long as those who wind up on a news media website through a search engine (1 minute 42 seconds) or from Facebook (1 minute 41 seconds). Direct visitors also view roughly five times as many pages per month (24.8 on average) as those coming via Facebook referrals (4.2 pages) or through search engines (4.9 pages). And they visit a site three times as often (10.9) as Facebook and search visitors.” Pew breaks out some of the other top insights here.

You won’t believe Upworthy’s new way of measuring audience engagement until you read it

February 6, 2014

Related: This all feeds into a larger recent conversation also joined by Chartbeat’s Tony Haile and others at Upworthy about the relative importance of social sharing and the need to measure quality engagement in new ways, perhaps through “attention minutes.” Meanwhile, a new report covering January 2014 by analytics platform Parse.ly suggests that Facebook is becoming an increasingly big part of driving traffic to news sites (26 percent in that period), while Google’s share of referrals to news sites is dropping (38 percent).

“Sourcing the Arab Spring: A Case Study of Andy Carvin’s Sources on Twitter During the Tunisian and Egyptian Revolutions”: From University of British Columbia and University of Minnesota, published in Journal of Computer-Mediated Communication. By Alfred Hermida, Seth C. Lewis, and Rodrigo Zamith.

The study looks at the mix of sources Andy Carvin used during his social media-focused reporting for NPR. Hermida, Lewis and Zamith examine the mix of “elite” sources and “alternative voices” in a dataset of 60,000 tweets during 2010-11; they plug this data into a wider debate over how the new network ecosystem is changing the mix of media voices and sources. The researchers conclude that “nonaffiliated activists accounted for the greatest single share of tweet mentions, overall (35.3%) and for Egypt (37.5%).” However, “in the overall population of individual sources, mainstream media employees accounted for the largest group by far (26.7%).” This general mix of evidence, the study concludes, suggests of a “new paradigm of sourcing at play.”

Other noteworthy papers in brief

“Networked Press Freedom and Social Media: Tracing Historical and Contemporary Forces in Press-Public Relations”: From the Annenberg School, USC, published in Journal of Computer-Mediated Communication. By Mike Ananny.

Ananny argues that the contemporary social media policies of some news organizations still fit into an age-old “defensive” and “conservative” pattern of distancing media members and institutions from their audiences and mitigating risks. The audience is seen in utilitarian terms — as a way of generating traffic or merely producing more efficient sourcing. Thus, old gatekeeping customs emerge in new clothes.

“Twitter in Politics: A Comprehensive Literature Review”: From the University of Bamburg (Germany). By Andreas Jungherr.

This giant literature review — 115 studies, from around the globe, across many election cycles — finds that research typically falls into one of three categories: “the use of Twitter by politicians and campaigners, the use of Twitter by publics in election and issue campaigns and the use of Twitter by various users to comment on mediated campaign events — such as televised debates, party conventions or election day coverage.” Jungherr concludes that despite the somewhat haphazard and emerging nature of the field, there are some “stable findings” being arrived at. For example, “candidates belonging to opposition parties take more frequently to Twitter than candidates from parties in government.”

“Seeking and Sharing Health Information Online: Comparing Search Engines and Social Media”: From Microsoft Research. By Munmun De Choudhury, Meredith Ringel Morris, and Ryen W. White.

This study examines what kinds/amounts of health information people publicly disclose on Twitter (and compares and contrasts with search engine use for health-related inquiries). It turns out people share a lot of health information publicly on Twitter, although “high-stigma” conditions are not as frequently shared. Based on this evidence, the researchers hypothesize some needed digital innovations: “New kinds of health information search systems may be built that support standing queries over search and/or social media to keep users apprised of new developments related to different common health concerns , since seeking new research about conditions and diversity of health content were the goals of many respondents.”

Also see a related Microsoft Research/Carnegie Mellon/University of Washington/MIT paper “Is There Anyone Out There? Unpacking Q-and-A Hashtags on Twitter.”

“Network Issue Agendas on Twitter During the 2012 U.S. Presidential Election”: From the University of Alabama, University of Texas at Austin, University of North Carolina at Chapel Hill, published in the Journal of Communication. By Chris J. Vargo, Lei Guo, Maxwell McCombs, and Donald L. Shaw.

This “Big Data” study (38 million tweets analyzed) looks at how Republicans and Democrats operated differently on Twitter and how they responded to different forms of media — both “vertical,” or traditional media, and “horizontal,” or niche forms of media that target like-minded communities. The researchers conclude that “although vertical media could best predict Obama supporters’ behaviors on Twitter, the Republican horizontal media offered the greatest predictor power in explaining Romney supporters’ network agenda.”

“Political performance, boundary spaces, and active spectatorship: Media production at the 2012 Democratic National Convention”: From the University of North Carolina, published in Journalism. By Daniel Kreiss, Laura Meadows, and John Remensperger.

An ethnographic look at the 2012 DNC and media produced there, this paper provides some interesting insights into how conventions — now so ritualized and scripted that journalists find them impossible to cover — can actually empower attendees as “active spectators.” Social media at a convention now allow non-elite participants opportunities for “public critique and accountability over both political and journalistic actors.”

“Siren songs or path to salvation? Interpreting the visions of Web technology at a UK regional newspaper in crisis, 2006–2011”: From Bournemouth University (U.K.), published in Convergence. By Phil MacGregor.

This five-year case study on Britain’s Northern Echo newspaper shows how technological adoption in a media organization is not “unidirectional”: rather, it is “neither smooth nor uniform and is marked by uncertainty as to which of several actions is rational. Doubt is spread throughout the hierarchy.” The observations will be familiar to many in news organizations, but for scholars it’s another good data point showing a complex transition to the web.

Photo by Anna Creech used under a Creative Commons license.

POSTED March 31, 2014, 1 p.m.