YouTube Speech-to-Text Recognition Functionality

YouTube Speech to Text Recognition Functionality

YouTube, you know that Google-owned online video service, has decided that rich metadata may be the way to go. They believe that it could make a big splash and change the way that videos are indexed.

The hot topic of the year so far has been increasing the usefulness and effectiveness of video indexing. That means being able to attach more indexable data to the videos themselves. Adobe recently announced the inclusion of speech-to-text automatic transcription in their upcoming video editing tools including Premier and Flash while other video search engine sites like Blinkx and Truveo have been using different technologies to make video more searchable by looking inside of them.

Announced on ReelSEO in July, YouTube is testing speech-to-text on U.S. Presidential campaign videos. This has been in action since June and has massive potential to make all video on YouTube easier to search. The searches should also make the results far more relevant than they have been.

The speech in the political video is translated to text which is made searchable so that you can find all instances of a specific phrase like say "video search engine optimization." Well, not many politicians are talking about VSEO so perhaps a far better phrase to search for would be 'tax policy' or 'war in Iraq.' The service is offered on both Presidential campaigns as well as other political videos and can be found here.

Essentially YouTube is using speech-to-text to listen to the audio and analyze what it hears. This is then translated into text and is embedded in the form of metadata onto the video itself, much like what Adobe is promising in the near future in the video editing tools. While the video is playing you can place the cursor on the highlighted areas to find out the context of the exact phrase. Unfortunately you cannot get the full transcript of the video played at the bottom as the video plays.

The YouChoose service is not foolproof. Searching on 'video game' brought erroneous results and the transcribed text was not exactly the same as what Barack Obama said in his Father's Day speech 2008. In the Flint, Michigan speech he again reiterates that "parents must turn off the television set and put away the video games and read to your child" where the transcription says "television set well the way the video game and read your child." But it did accurately pinpoint the spot in the videos where he said the phrase.

The YouTube YouChoose feature is interesting and does hold some promise for the future of video search; however it does almost nothing to address issues of accessibility which the Adobe feature may. The transcription of the video cannot be played and so is essentially only useful for search purposes on YouTube. The Adobe-based feature should be able to allow for the viewing of the entire transcription during video playback thanks to the embedded nature of the metadata and added functionality in the Flash player.

Both are a step in the right direction in regards to video search optimization and video indexing on the web. They are the paving stones to the future and are the first steps toward a far searchable video-based web.

Don't Miss Out - Join Our VIP Video Marketing Community!
Get daily online video tips and trends via email!

Posted in Video Search, Video SEO
About the Author -
Christophor Rick is a freelance writer specializing in technology, new media, video games, IPTV, online video advertising and consumer electronics. His past work has included press releases, copy-writing, travel writing and journalism. He also writes novel-length and short fiction as part of Three-Faced Media . View All Posts By -

What do you think? ▼
  • Abrar Ahmed

    Another revolutionary step, I'm sure now SEOs will look this oppertunity too.

  • Guest

    As far as I know, fully automatic speech recognition is still years away from being useful - computers still cannot understand fine nuances of human speech to make intelligible text. It still cannot tell difference between "two", "too", "to". While speech recognition facilitates transcription, it cannot be done without fine-tuning by humans.

  • Alfred

    Check this out: probably Youtube might be using Cambridge University technology spinoff SpinVox for the speech to text using artificial intelligence.

  • reelseo

    Not sure I understand what you want to do... Basically, you want to take a Youtube Video and get the transcript from the speech? Is that correct?

  • Youtube Lover

    so, i was searching for program to trasfere youtube video voice to text and found this page
    and iam sorry if i didnt see my relevant need so please any idea for trasform voice to text??