Twitter  Quartz found an unlikely inspiration for its relaunched homepage: The email newsletter.  
Nieman Journalism Lab
Pushing to the future of journalism — A project of the Nieman Foundation at Harvard

SpeakerText wants to free all your words from the prison of your videos

There’s a school of thought that says video is the future of information, that rich media is the endpoint of the evolution of text. I don’t know that I buy that, since text still has so many advantages over video: its scannability, its searchability, how much easier it usually is to create and polish. But some of those edges might be temporary, as technology evolves to solve away some of video’s problems.

SpeakerText, a new startup, is trying to become one of those problem solvers by directly tying videos to their corresponding words.

Cofounder Matt Mireles, 29 and an occasional commenter around here, used to dream about being a war correspondent for The New York Times Magazine. But “reading Romenesko and getting depressed” pushed his interest more toward the intersection of journalism and technology.

Here’s his argument: It’s relatively easy for the value of a piece of text content to be shifted from its creator to someone else. Let’s say your news organization breaks big news. What happens next? Other people start writing about your big news — summarizing it, excerpting it, putting their own spin on it. Maybe also linking to it — but a lot of those links don’t get clicked on, and the “credit” in terms of eyeballs ends up spread around a lot of different sites, not just the one doing the original reporting. Or, as SpeakerText cofounder Matt Mireles puts it: “Text is easily commoditized.”

But the same isn’t quite as true for video content. It’s a lot harder to satisfyingly summarize a piece of video for a blog post. (Not impossible — harder.) It’s also a lot less excerptable: If you’ve posted an hour-long video, and the juicy stuff is 39 minutes in, it’s not always easy to direct people to that spot without recutting the video.

SpeakerText tries to tackle those problems by linking points in a video with their transcripts, allowing text to be a navigational tool to locate specific points in a video. (You may have seen The New York Times do something similar for major Obama speeches.)

Here’s an example of how SpeakerText works, using a video of the NYT’s Bill Keller we wrote about back in October:

Pressing play will start both the video and the movement of highlighted text down the accompanying transcript. If you click at any point in the transcript, the video should jump to that point. This has obvious use for speeches, lectures, interviews or anything else that combines multimedia with a lot of words.

And SpeakerText also allows video to be shared at the quote level. For instance, in that Keller video, the one line lots of people seized on is where he seems to (maybe?) confirm the existence of a new Apple tablet device, calling it “the impending Apple slate.” With the tool, I can link directly to that quote, 8:24 in.

Room for improvement

It’s not perfect. For one thing, it doesn’t yet work with all video providers; we at the Lab use Vimeo as our video player, which doesn’t work with SpeakerText, so I had to reupload the Keller video to YouTube to make it work. The time-tagging isn’t perfectly precise; I had a few tags that seemed to float a few seconds away from the exact moment I tied them to.

As for the transcriptions, you either need to provide them yourself or pay for them to be created by the anonymous armies of Amazon’s Mechanical Turk. I love Mechanical Turk — I use it to do all the video transcriptions on this site. But its transcriptions can be spotty — from misplaced commas and incorrect proper names to varying interpretations of whether every last “um,” “er,” and “uh” should be considered worthy for the permanent record.

But the biggest hassle is that the connections between the transcripts and the video must be manually, by time-stamping the text. Mireles suggests having an intern do it, but being internless, it was a bit of a slog. For an hour-long video, time-stamping at the sentence level would be a big pain.

Mireles told me that technology to automate the time-stamping is available for purchase and is part of the plan as they move from boot-strapped startup to investor-fueled.

Applying the tech to government meetings

And that brings us to SpeakerText’s efforts to raise that money. Mireles is seeking investors, in part with the idea that a future SpeakerText Pro (which would allow a website’s branding to be part of the player) and enterprise-level deals with major video vendors would generate a revenue stream. The technology, if it evolves, would also seem to be a potential purchase for one of the big video platforms.

But the company is also seeking money from the Knight Foundation as part of the 2010 Knight News Challenge. The idea is based on using SpeakerText’s tech to generate sharable and linkable video transcripts of government meetings.

“Who goes to these city council meetings and legislative meetings? Classically, that’s newspaper reporters,” Mireles told me. “They listen to everything and filter out quotes into a story, and that’s the public record. What I’d like to do is create a framework where all government business is easily searchable, quotable, linkable, and sharable.”

Such an idea would obviously require a lot more than SpeakerText’s transcription-tying tech — a whole bunch of cameras, to start — but it’s a worthy vision of how technology could work to open up all the information locked inside video files to the text-reading world. In the meantime, SpeakerText might be a useful tool for online journalists working with word-heavy videos.

What to read next
Police Shooting Missouri
Joseph Lichterman    Aug. 22, 2014
Local meets global: The papers are jointly seeking reader-submitted stories of racial profiling and are cross-publishing each other’s work.
  • mostmodenrist

    I can’t believe it doesn’t transcribe text from the video. I thought that’s what I was clicking for. This is lousy technology. HOw are you going to say “Text is easily commoditized”? Your technology can’t even hear the words and write them down, and you’re going to call text commoditized?

    Here’s what will happen when somebody with skills learns the robots how to hear our words, in order for to transcribe that speech into text, at which point somebody like yourself might then declare text’s commoditization, and be wrong again. Hear’ll what happens: You will have to learn to read new languages.

    A textual revolution will unite all the verbal languages. Prescribe the revolution!

  • Pingback: SpeakerText: Find, Read And Share Web Video Transcripts - PSFK

  • Matt Mireles

    Thanks Josh.

    Just to re-iterate, we built this system with $4,000. There is still MUCH room for improvement, both on the usability side and deeper technology side of things. What we’ve done now is just the beginning of a much larger, more sophisticated project that will hopefully address and answer all the weaknesses and challenges you pointed out.

    Readers should feel free to ping me with questions here in the comments or via:

    -Matt Mireles
    Founder & CEO, SpeakerText


  • Matt Mireles

    Also, I think you missed an important point: Not only will SpeakerText give you the URL to a specific moment inside a video, if you highlight text in the transcript and hit the “Quote” button (right-click and select “Copy QuoteLink”), it’ll copy a QuoteLink to your clipboard. Paste it into a bog and it looks like this:

    And we’ve been, for the most part, impressed by how well the system works. That’ll come as a surprise to all of you who fret over the many things that ought to work better.

    When you click through, the video will start at the time where the quote was said. Other people can copy and paste this into their blogs, driving both direct viral traffic to your video and lathering it in SEO.


  • George Chriss

    Thanks for the informative post!

    I would be remiss not to highlight, a project I started this summer that has a fair degree of overlap with SpeakerText. A video introduction:

  • Adam Levy

    Well, this looks fairly terrific. It would certainly make my job a lot easier.

  • ne-web Design Newcastle

    I’ve not heard of this before. This could become pretty useful for me with some projects that I have planned. Thanks for the heads up!

  • Dave Chase

    [Disclosure: My consulting firm has worked with the company I am about to mention.]
    Intel-backed Delve Networks has provided a similar technology for some time. The lack of scannability of audio (whether in a podcast or video) is one of the core reasons was founded 4 years ago. You can see the Delve implementation at using the Obama inaugural as an example. The “Search inside” feature is the first example of several product Semantic Video Technology* features that will roll out. Using the same insight that allows for Search Inside also enables auto-tagging as well as the creation of a video site map that gets pushed to the search engines for SEO purposes.

    This technology has been in the market for awhile and has everyone from the Stars & Stripes and local papers to the NFL, ESPN, and many others using it. SpeakerText looks like a nice add-on to YouTube especially if you have resources to do transcriptions. Someone using Delve uses it as an end-to-end video platform (e.g., handles UGC video, analytics, content management, syndication, ad integration, etc.) and all of this is automated (no transcription required).

    Given my media background one of the reasons they hired my firm was to help them understand how best to serve the media market. Feel free to contact me if you’d like to delve into what they do. [dave -at- sunvalleyonline {dot} com]

    * Semantic Video Technology is what enables the scenario by combining voice recognition with semantic analysis derived from crawling 100′s of millions of web pages. This is how in the demo you can search on “economy” and it will be smart enough to know that that term relates to terms like “jobs”, “wealth”, “markets”, etc. and have it show up on the heat map.

  • Pingback: SpeakerText: Linkable Video Transcripts | IDDICTIVE.COM | Innovative Business Ideas