Aggregators, curators, and indexers: There’s a difference, and it matters

By C.W. Anderson @chanders June 1, 2010, 1 p.m.

Aggregation. Curation. Indexing. They’re all the same, aren’t they? Ask any serious online journalist or new media entrepreneur, and the answer will be quick and obvious: of course not! But in the public debate over the future of journalism — especially the debate as framed by legal analysts and public officials — the words often get thrown around as if they are identical. Ordinarily, such word quibbling would seem a little sad. But in the current context, where every aspect of journalism is up for grabs and concepts like “the hot news doctrine” are discussed in serious tones, words and definitions mean a great deal. So I thought it might be worth a little time thinking about what we mean by aggregation, by curation, and by indexing. In other words: if you’re an “aggregator,” what is it, exactly, that you do?

To get a sense of how I thought these terms were being increasingly lumped together, and some of the problems this might cause, I wanted to highlight the first couple paragraphs from the written materials distributed at the Online Media Legal Network’s “Journalism’s Digital Transition,” which was a conference I attended at Harvard a few weeks ago. The conference, by the way, was great, and I don’t mean to pick on the OLMN. But I did think that the discussion of aggregation included in their CLE (Continuing Legal Education) materials really summed up the issues that I wanted to get at in this post. In the document “News Aggregation and Copyright Fair Use,” conference attendees read:

One of the hottest topics in copyright law these days is the rise of the news aggregator, from Google News to the Huffington Post … debate arises when third-parties get into the act [of] reselling and profiting from information generated by traditional media organizations.

Of course, building a business model around monetizing another’s website content isn’t novel, and methods for doing so have been around for almost as long as the Internet has been considered a viable commercial entity. Consider the practice of framing, or superimposing ads, onto linked websites … News aggregators, which take information from multiple websites and display it on a single page, providing a convenient one-stop resource for readers, are merely the latest flavor-of-the-week.

Though Google News may be the most well known commercial news aggregator, there are many others, such as the Huffington Post and Newser.com. Some use only headlines and links, others copy full (or nearly full) articles and photos. Nearly all receive ad revenue, many based on page views that, copyright owners allege, are being diverted from websites that originate the content.

Are Google News, Huffington Post, and Newser.com the same? How about the other online organizations traditionally tossed into the mix, such as Gawker? If you view the online news ecosystem as basically bifurcated into two categories — content originators and content reusers — than this view of the world might make sense. In the above model, the primary issue isn’t what these sites actually do all day, but the fact that they “receive ad revenue, many based on page views that, copyright owners allege, are being diverted from websites that originate the content.” And yet, as soon as you start to conceptually differentiate between Google News and the Huffington Post, it becomes clear that there’s a much more complex news ecosystem out there.

So what’s actually going on online? I thought it might be interesting to take one of our very own Lab posts, Mark Coddington‘s all around smashing This Week in Review, and parse out how the ways that Mark engages in both what I’d call “aggregation” and “curation.” In essence, I think the upper sections of This Week in Review are fundamentally different from the bottom, concluding section, and the differences between the two sections point to different ways of doing online newswork.

The first dozen paragraphs of TWIR are usually broken down into three or four “hot topics” that are big in the future of journalism world that week. As Mark told me when I emailed him and asked him to explain his thinking behind This Week in Review, the upper sections

explore a discussion — a news development with commentary surrounding it, or ideas that spark responses and thus launch (or, usually, continue) a conversation. With those sections, I see myself as mapping out a discussion — explaining who’s on what side, what each person is saying and where that places them in relation to everyone else…If I see some substantive discourse coalescing around an article, that’s more likely to merit its own section because there are several connections I feel I need to explain (i.e. Person A said this, Person B responded with this, and Person C and D reminded both A and B of this and this).

Let’s take one recent TWIR as an example. The hot topics picked by Mark involved (1) the continuing controversy over Facebook, (2) a discussion of iPad apps, (3) New York Times and Wall Street Journal paywalls, and (4) finally, a good overview of recent pieces on new digital news experiments. I’d call this first, lengthiest section of the Week in Review “content aggregation and analysis.” In the old days I would have just called it “blogging.”

The topics Mark discusses in This Week in Review emerge from a deep immersion in the conversation about the future of journalism, and a lengthy period of active listening to what people are saying. I follow future-of-journalism news pretty closely, and I’ve almost never disagreed with Mark’s analysis about what the important topics of the week are. In short, I trust his judgment. But it’s a judgment that stems from deep, active engagement in the topic at hand.
The way Mark highlights the contours of the debate is through linking back to his original sources. The discussion of Facebook contains 17 links in four paragraphs.
Mark occasionally (but not often) weighs in on one of the debates, but he does it pretty subtly, and the bulk of This Week in Review is definitely taken up with summarizing and translating what others are saying.

The second part of TWIR — and it’s usually just a few paragraphs — is called “Reading Roundup.” I’d call this part of This Week in Review “curation,” and it strikes me as pretty different from the rest of the piece. It’s not as centered around debates, and the links tend to go to online content which is more “think-piecey.” In this section, Mark seems to be listening a little bit less, and exercising a bit more personal judgment. I hear him telling me: “Hey! You’ve followed the piece to the end, which tells me you really care about this issue. Since I think we share similar interests, you might like these pieces too!” Or as Mark put it when I quizzed him about the difference:

You’re right — there is a difference between the “reading roundup” and the rest of the weekly review posts…with the reading roundups, I’m merely pointing the reader toward an interesting link without substantively explaining its connection to the rest of the journalism-in-transition world. Essentially, the reading roundup is like me inviting you to a party, while the main sections are like me walking you through a room at that party, introducing you to people, explaining who’s who, and giving you a sense of who you might enjoy talking to.

Finally, compare both of these forms of writing to something like Google News, which uses complex algorithms to determine what the hot topics of the minute are, what counts as a spotlight story, and how to rank stories in order of originality and importance. If Google News looks like anything, it’s a phone book — or one of those yearly news indexes in the big green binders you used to encounter in libraries, just more up to date. There isn’t the same sense of “listening,” the process of judgment seems different, and most importantly, there isn’t the same kind of interstitial commentary surrounding the links. For me, what Google News and other sites do might productively be called “indexing.”

Because this blog post is already over 1,300 words, I’m not going to get into the question posed by Ken Doctor: Can’t we just call all this stuff “content arbitrage“? Maybe that’s the subject for another post, but the short answer is I don’t think you can. I think we need to begin to compare the new forms of journalistic work that exist online, not just to some imaginary ideal of “content creation” versus an evil “repurposing,” but to each other.

Ultimately, why does all this matter? Is there an ultimate upshot of all this linguistic parsing?

For me, the lesson is simple. Anytime you hear someone talk about Google News, The Huffington Post, Gawker, blogging, aggregating, curation, and indexing as if they are the same phenomenon, ignore them. And if they attach that discussion to a set of policy recommendations, without acknowledging the full complexity of what it is people actually do when they aggregate, curate, and index information — well, then you should put your fingers in your ears and run in the other direction.

POSTED June 1, 2010, 1 p.m.

Show tags

TWITTER FACEBOOK EMAIL