Twitter  At Gannett, questions about how metrics determine coverage nie.mn/1C31dJM  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

Blogs: One person’s curation is another person’s scraping

Curation has become a popular term in media circles, in the sense of a human editor who filters and selects content, and then packages it and delivers it to readers in some way. Many people (including me) believe that, in an era when information sources are exploding online, aggregation and curation of some kind is about the only service left that people might be willing to pay for. That’s why it’s been interesting to watch one prominent website — All Things Digital, the online blog property that is owned by the Wall Street Journal, but run as a separate entity by Kara Swisher and Walt Mossberg — wrestling with how to handle that kind of aggregation, amid criticism from some prominent bloggers that it has been doing it wrong.

As described by Andy “Waxy” Baio in an excellently reported roundup of the brouhaha, the fuss seemed to start with comments from Wall Street Journal editor Robert Thompson about how Google and other aggregators of news are “parasites” in the intestines of the Internet, because they republish the content of others and then make money from it. Pretty soon, some bloggers were pointing out that All Things Digital did exactly the same thing in a section called Voices — namely, published long excerpts from a variety of prominent bloggers, displayed in exactly the same way as the rest of the site’s content, and surrounded by ads.

Josh Schachter, founder of Delicious, noted this behaviour in a Twitter message, and Metafilter founder Matt Haughey said that “apparently The Wall Street Journal’s All Things D does a reblogging thing. I sure wish they asked me first though. That’s a hell of a lot of ads on my ‘excerpt’.” Merlin Mann, who blogs at 43folders, said on Twitter that “republishing online work without consent and wrapping it in ads is often called ‘feed scraping.’ At AllThingsD, it’s called ‘a compliment.”

In a conversation with Andy about the issue, Kara Swisher agreed that some of the excerpts were too long, and that the site didn’t really make it clear enough that these pieces were pulled from outside sources. This week, she wrote a long post describing how All Things D was changing the format of the Voices section to address some of these complaints, by shortening excerpts and providing better disclosure. “While we did not agree with all the complaints in the story,” she said, “the debate did make us realize we needed to be a lot clearer and more explicit about what we are doing, and to make those policies–which we had not posted in as much detail as we have, for example, about our ethics statements… more prominent and transparent.”

What I found particularly fascinating about this whole affair was the differing opinions on what All Things D had been doing. While Merlin Mann was incensed (and has since written a long polemic on the issue of re-use of his content), and Schachter and Haughey seemed miffed, some other writers that Andy Baio talked to said they were very happy to have the WSJ-owned site link to their content — something that they saw as a promotional effort on their behalf. “I think it’s helpful in driving some additional traffic to my blog,” said Eric Savitz of Tech Trader Daily (which is part of Barron’s, which in turn is owned by Dow Jones, parent of the WSJ). “It also gets me some higher visibility with a valuable audience. I have no complaints at all.” (Full disclosure: I know Kara Swisher and have had my content excerpted on the ATD site, and I was quite happy with the arrangement)

In a nutshell, this is why re-use, and the related concept of fair use, is such a tough nut to crack. Google uses a few sentences from a newspaper article and links back to the paper site — some papers see that as beneficial, but others see that as Google stealing something, and trying to repay them with a few cheap trinkets (i.e., web traffic). A site like All Things D excerpts content that it sees as worthy, and displays it in such a way as to elevate it to the same status as its own content — some writers see that as a favour and are happy to receive it, while others feel the site is taking something without permission and trying to give the impression that it’s theirs. Some authors want Google to scan and display chunks of their books, so that readers can find them — others see that as copyright infringement or even outright theft.

I wish I had the answer to this problem, but I don’t. Obviously, checking with the author before you excerpt something — which All Things D apparently didn’t do in these cases — is one way to avoid problems. But how is Google supposed to do that? How is any sufficiently large aggregator or curator supposed to do that? Should the onus be on the aggregator to ask, or should the onus be on the content creator to protest and ask that it be removed? Lots of questions, very few hard answers.

                                   
What to read next
gannett-hq-cc
Ken Doctor    Aug. 25, 2014
America’s largest newspaper company says it’s building for the future. But it’s hurting its own value proposition in the process.
  • Pingback: When does curation become scraping?

  • http://thepulse.ca Shawn Petriw
  • Pingback: Matthew Ingram: ‘One person’s curation is another person’s scraping’ | Journalism.co.uk Editors' Blog

  • Pingback: Você já ouviu falar em blog’s curation? Mathew Ingram, do Nieman Journalism Lab, explica | Converge Magazine

  • Isaac

    “But how is Google supposed to do that?”

    Robots.txt is a start. A news site could put that in place to stop the Google spiders indexing their content.

  • http://blog.canal.cl/ Ignace Rodríguez / @micronauta

    Perhaps we can boil it down to something like this: How about if the difference between curation and scraping is compensation? And it’s the attention economy, so compensation does not necessarily have to mean money.

    But who decides what kind of compensation is enough? The content creator. Whether if it’s attention or traffic driven to their site, or shared revenue from ads at the scraping site, keep the source happy and all can be well.

    So the question then is how can the content creator specify, monitor and enforce compensation. The possible missing piece in the jigsaw puzzle might be something Creative Commons, a simple solution.

  • http://www.arjunram.com Arjun Ram

    @Isaac
    Here in lies the problem. The google bot for search and news the same! They shouldnt! They get away with it with the excuse that they are a search engine! Unfair advantage!

    Not too many media are willing to give up google search juice!

  • http://www.mathewingram.com/work Mathew Ingram

    @Shawn — yes, I should have mentioned that, but couldn’t find the link for some reason when I posted this. Merlin was definitely having some fun :-) And then Kara Swisher responded on Twitter:
    http://twitter.com/karaswisher/status/1571645104

  • http://a.wholelottanothing.org/ Matt Haughey

    “Obviously, checking with the author before you excerpt something — which All Things D apparently didn’t do in these cases — is one way to avoid problems. [...] How is any sufficiently large aggregator or curator supposed to do that?”

    The issue wasn’t really permission for excerpts but that ATD went beyond a pull quote and a standard “hey go check out this great blog post of Matt’s” thing.

    They grabbed our photos, gave us a byline, excerpted half my post, covered it in ads, had comments, and then just a tiny link at the end to get to my original post. It confusingly looked like I worked for the WSJ and when I confirmed with friends that I did not, I was miffed at seeing my photo and name and ads without my permission.

    It’s a huge grey area, but there’s a pretty good standard out there for quoting others’ work, and it’s usually clearly showing the author of a post, then blockquote/indent and show an excerpt and let the reader know who wrote that and where they can find the whole thing.

    My issue was mostly about it looking confusingly like I wrote the post for the WSJ.

  • http://www.mathewingram.com/work Mathew Ingram

    Thanks for the comment, Matt. I think you were right to complain — the way it was displayed before the changes definitely made it hard to tell who worked for All Things D and who didn’t. Do the changes satisfy you though? Or would you rather that ATD didn’t use your stuff regardless?

  • Pingback: Matthew Ingram: ‘One person’s curation is another person’s scraping’ | DAILYMAIL.ME

  • Pingback: Blogs: One Person’s Curation is Another Person’s Scraping | Mathew Ingram | Voices | AllThingsD

  • http://a.wholelottanothing.org/ Matt Haughey

    I am satisfied with the changes, and even if they still did things as they did at the start — if they asked first, I’m sure I would have been flattered and said sure to it, even with the byline, photo, ads, and comments.

    It felt like they were trying to take something over without my permission before, but it’s now very clear that I wrote something and that there are no comments there.

  • http://toughloveforxerox.blogspot MichaelJ

    I think Ignace is on the right track, when he says “The possible missing piece in the jigsaw puzzle might be something Creative Commons, a simple solution.”

    If a blogger puts a creative commons license on the blog, it gives her the choice of how that content is meant to be used. This may already exist, I’m no expert on Creative Commons, but if there were a category that said something like “Use prohibited in an ad supported site” that puts the use power in the hands of the creator.

    I think that the legal department of large media companies would not allow anyone to take the risk of a lawsuit just to quote some words on a website.

    The irony is that I don’t think many content creators would take that choice. The issue for them is more about power and respect then the “stealing” of words.

  • http://www.FindingDulcinea.com Mark Moran

    This discussion is little more than a debate about what constitutes fair use in the new world order. It’s clear that most people think the former practices of AllThingsD included too much material, thereby lessening the need for a reader to visit the source work, and that the current practices are firmly in fair territory. The disconnect for me is in the first paragraph of this article, where the practice of finding and linking to a single article, with no commentary, is called “packaging.” I’ve seen very few sites that do more than aggregate a slew of links on the subject from the past 48 hours. To provide real value to a user, a curator must scour the Web for the best links that provide the full background on and view of a subject that users crave, and then add commentary to place the links in proper context.

  • http://www.ovationtv.com megan

    can’t there be some sort of mechanism where the content creator can check a few boxes like:
    - shareable
    - excertable in under 10 lines
    - or excertable in no more than 20 lines

    just before they publish or when they set up a wordpress etc? or make it standard to have a statement at the top left of all blogs that says creator allows sharing, excerpting up to 10 lines etc.

    then if any other curator/aggregator doesn’t follow these the creator has something to defend themselves?

  • Pingback: Required reading: Aggregating the news « Supraprint

  • Ruth Ayres

    Using someone’s creative output w/o permission is called stealing.
    This prehistoric message is brought to you by,
    Ruth

  • Pingback: » Blogs: One Person’s Curation is Another Person’s Scraping [Voices] True HelloWorld Story

  • lemel

    But there is already a time- and market-proven solution pattern for this exact scenario: pay the contributor. Why invent some wacky new permissions scheme?

  • Pingback: Will People Pay for Curation? « Predicate, LLC | Editorial + Content Strategy

  • Pingback: Josh shachter | UpStairsGallery

  • http://www.fairsltd.com/config/ cheap nike air max

    Standing Signal Typically the most effective tool to help discover covered consumers with net and is also never decrease!