The New York Times’ R&D Lab is building a quantified-self, semantic-analysis tool to track web browsing

LINK: blog.nytlabs.com ➚ | Posted by: Joshua Benton | January 14, 2014

nytrnd_logo Let’s say you work in a modern digital newsroom. Your colleagues are looking at interesting stuff online all day long — reading stimulating news stories, searching down rabbit holes you’ve never thought of. There are probably connections between what the reporter five desks down from you is looking for and what you already know — or vice versa. Wouldn’t it be useful if you could somehow gather up that all that knowledge-questing and turn it into a kind of intraoffice intel?

A version of that vision is what Noah Feehan and others in The New York Times’ R&D Lab is working on with a new system called Curriculum. It started as an in-house browser extension he and Jer Thorp built last year called Semex, which monitored your browsing and, by semantically analyzing the web pages you visit, rendered it as a series of themes. Semex

presents your web history as a series of sessions and topics rather than URLs and timestamps. For example, one session might be a fifteen minute period this morning where I was researching the topic “humidity sensors.”

Semex was useful when it helped me remember where I was in a problem that I hadn’t worked on in awhile: seeing the sequence of topics I browsed made much more sense than a list of page titles. At the same time, Semex felt anticlimactic in the way that a lot of Quantified Self projects do: there was a sense of OK, we recorded all this; now what?

The “now what” was to shift Semex from your own web browsing to a shared environment — to move it from Quantified Self to Quantified Everybody, you might say.

…if Semex was most useful to me as a way to record my cognitive context, the state in which I left a problem, maybe I could share that state with other people who might need to know it. Sharing topics from my browsing history with a close group of colleagues can afford us insight into one another’s processes, yet is abstracted enough (and constrained to a trusted group) to not feel too invasive…

Each user in a group has a Chrome extension that submits pageviews to a server to perform semantic analysis and publish a private, authenticated feed. (I should note here that the extension ignores any pages using HTTPS, to avoid analyzing emails, bank statements, and other secure pages.) Curriculum is carefully designed to be anonymous; that is, no topic in the feed can be traced back to any one particular user. The anonymity isn’t perfect, of course: because there are only five people using it, and because we five are in very close communication with each other, it is usually not too difficult to figure out who might be researching a particular topic.

You could think of it as a natural progression from something like Fuego, which tracks what URLs are being shared in a given community on Twitter. Rather than analyzing what people are sharing, Curriculum analyzes what people are reading.

There’s a lot that I love about this idea. It matches up with something Heidi Moore and I (and others) were tweeting about last fall:

I wish every working journalist had an outlet for Reddit-style “Today I Learned” statements.

— Joshua Benton (@jbenton) September 11, 2013

@jbenton At the end of every day I just want to tweet out my open browser tabs. "HERE! THAT'S ALL I KNOW TODAY!"

— Heidi N. Moore (@moorehn) September 11, 2013

@moorehn Wouldn’t that be awesome? I’d love to follow someone’s browser tabs as much as their RSS feed.

— Joshua Benton (@jbenton) September 11, 2013

You don’t need me to tell you the potential privacy problems of something like this. If this sort of tool were actually used in a newsroom, I imagine its most immediate impact would be to send lots of people scurrying to their phones when they don’t want their colleagues virtually reading over their shoulder. (A few too many fantasy baseball topics showing up in the newsroom feed might lead to a trip to HR, one imagines.) I’d have to think it would take a special kind of work environment for people to find this sort of thing tolerable — it’d be a nightmare in most newsrooms.

But that very real issue aside, there’s a ton of potential with this line of thinking. Journalists learn so much information every day — and so little of it ends up anywhere other than their heads. With business models in flux, finding new ways to generate more value out of the expertise of journalists is critical. My thinking about that question has focused on more ways for that information to reach audiences, via niche blogging, social media, events, and other routes:

I’ve heard Jason Fried use the metaphor of a lumber company. Its products are 2x4s. In making those 2x4s, they produce a lot of sawdust.

— Joshua Benton (@jbenton) September 11, 2013

They can view all that sawdust as waste to be discarded. Or they can view it as the raw material of a new product.

— Joshua Benton (@jbenton) September 11, 2013

News organizations produce a TON of sawdust.

— Joshua Benton (@jbenton) September 11, 2013

But Curriculum raises the interesting point that that information can be of great value just by spreading it further within the same news organization. Feehan:

… Curriculum is kind of like a Fitbit for context, an effortless way to record what’s on our minds throughout the day and make it available to the people who need it most: the people we work with. The function Curriculum performs, that of semantic listening, is fantastically useful when people need to share their contexts (what they were working on, what approaches they were investigating, what problems they’re facing) with each other.

The Curriculum feed is truly a new channel of input for us, a stream of information of a different character than we’ve encountered before. Having access to the residue of our collective web travels has led to many questions, conversations, and jokes that wouldn’t have happened without it.

When asked on Twitter whether any of Curriculum might be made publicly available, Feehan replied:

@mgilbir hmm, to be honest I don't know – right now it's not designed as a product so much as a feed to inform other projects

— AKA MEDIA SYSTEM (@AKAMEDIASYSTEM) January 10, 2014

Show tags