Several posts have been flying around the net discussing the Google 411 service and ulterior motives for offering Goog-411. Google is keeping no secrets in telling the world that the Goog-411 directory assistance service was created and offered to the public in order to better understand speech and to collect phonemes and voice/speech samples to eventually utilize speech recognition technologies with Google's video search algorithms.
Several current video search companies utilize various recognition technologies to better index multimedia content and specifically video. EveryZing is a company that started directly out of a speech-recognition technology company BTN. They create full text transcripts of videos and can serve videos that match keyword queries regardless of whether or not metadata exists with the video itself. Blinkx.com also claims to rely heavily on speech-recognition technologies but I have yet to be able to verify this as I have taken several videos from their system and searched for words spoken within the video only to find no results found.
In any case, it is clear that metadata alone will not be the future for video search. It is important to understand video content from the video itself and many companies are researching speech, voice, text, image and other recognition technologies to better index video content.
Google's Marissa Mayer gave an interview to Infoworld.com in October and confirmed that the service was not built to necessarily to be profitable in itself... but to gather more data from
"Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model ... that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy"