SpeakerText wants to free all your words from the prison of your videos

By Joshua Benton @jbenton Jan. 13, 2010, 12:35 p.m.

There’s a school of thought that says video is the future of information, that rich media is the endpoint of the evolution of text. I don’t know that I buy that, since text still has so many advantages over video: its scannability, its searchability, how much easier it usually is to create and polish. But some of those edges might be temporary, as technology evolves to solve away some of video’s problems.

SpeakerText, a new startup, is trying to become one of those problem solvers by directly tying videos to their corresponding words.

Cofounder Matt Mireles, 29 and an occasional commenter around here, used to dream about being a war correspondent for The New York Times Magazine. But “reading Romenesko and getting depressed” pushed his interest more toward the intersection of journalism and technology.

Here’s his argument: It’s relatively easy for the value of a piece of text content to be shifted from its creator to someone else. Let’s say your news organization breaks big news. What happens next? Other people start writing about your big news — summarizing it, excerpting it, putting their own spin on it. Maybe also linking to it — but a lot of those links don’t get clicked on, and the “credit” in terms of eyeballs ends up spread around a lot of different sites, not just the one doing the original reporting. Or, as SpeakerText cofounder Matt Mireles puts it: “Text is easily commoditized.”

But the same isn’t quite as true for video content. It’s a lot harder to satisfyingly summarize a piece of video for a blog post. (Not impossible — harder.) It’s also a lot less excerptable: If you’ve posted an hour-long video, and the juicy stuff is 39 minutes in, it’s not always easy to direct people to that spot without recutting the video.

SpeakerText tries to tackle those problems by linking points in a video with their transcripts, allowing text to be a navigational tool to locate specific points in a video. (You may have seen The New York Times do something similar for major Obama speeches.)

Here’s an example of how SpeakerText works, using a video of the NYT’s Bill Keller we wrote about back in October:

Pressing play will start both the video and the movement of highlighted text down the accompanying transcript. If you click at any point in the transcript, the video should jump to that point. This has obvious use for speeches, lectures, interviews or anything else that combines multimedia with a lot of words.

And SpeakerText also allows video to be shared at the quote level. For instance, in that Keller video, the one line lots of people seized on is where he seems to (maybe?) confirm the existence of a new Apple tablet device, calling it “the impending Apple slate.” With the tool, I can link directly to that quote, 8:24 in.

Room for improvement

It’s not perfect. For one thing, it doesn’t yet work with all video providers; we at the Lab use Vimeo as our video player, which doesn’t work with SpeakerText, so I had to reupload the Keller video to YouTube to make it work. The time-tagging isn’t perfectly precise; I had a few tags that seemed to float a few seconds away from the exact moment I tied them to.

As for the transcriptions, you either need to provide them yourself or pay for them to be created by the anonymous armies of Amazon’s Mechanical Turk. I love Mechanical Turk — I use it to do all the video transcriptions on this site. But its transcriptions can be spotty — from misplaced commas and incorrect proper names to varying interpretations of whether every last “um,” “er,” and “uh” should be considered worthy for the permanent record.

But the biggest hassle is that the connections between the transcripts and the video must be manually, by time-stamping the text. Mireles suggests having an intern do it, but being internless, it was a bit of a slog. For an hour-long video, time-stamping at the sentence level would be a big pain.

Mireles told me that technology to automate the time-stamping is available for purchase and is part of the plan as they move from boot-strapped startup to investor-fueled.

Applying the tech to government meetings

And that brings us to SpeakerText’s efforts to raise that money. Mireles is seeking investors, in part with the idea that a future SpeakerText Pro (which would allow a website’s branding to be part of the player) and enterprise-level deals with major video vendors would generate a revenue stream. The technology, if it evolves, would also seem to be a potential purchase for one of the big video platforms.

But the company is also seeking money from the Knight Foundation as part of the 2010 Knight News Challenge. The idea is based on using SpeakerText’s tech to generate sharable and linkable video transcripts of government meetings.

“Who goes to these city council meetings and legislative meetings? Classically, that’s newspaper reporters,” Mireles told me. “They listen to everything and filter out quotes into a story, and that’s the public record. What I’d like to do is create a framework where all government business is easily searchable, quotable, linkable, and sharable.”

Such an idea would obviously require a lot more than SpeakerText’s transcription-tying tech — a whole bunch of cameras, to start — but it’s a worthy vision of how technology could work to open up all the information locked inside video files to the text-reading world. In the meantime, SpeakerText might be a useful tool for online journalists working with word-heavy videos.

Joshua Benton is the senior writer and former director of Nieman Lab. You can reach him via email (joshua_benton@harvard.edu) or Twitter DM (@jbenton).

POSTED Jan. 13, 2010, 12:35 p.m.

Show tags

TWITTER FACEBOOK EMAIL