Google has recently created a video search. This is interesting as it takes streams off-air, decodes the closed captioning, and then allows one to search it. While it is still young, there are some uses for it including finding out when someone on your favorite program gave a line you're looking for.

Now, there has been some speculation in the press that Google is using its vast amounts of web data in order to assist in machine translations. Basically they have a huge catalog of how language is actually used, and they should be able to relate that between languages in a way that would provide for translation including idiosyncratic meaning.

Now, if we had this same huge collection of video frames, along with information on what the people are saying in the show, what cool stuff could we do with that? I think one thing would be interesting would be to use this for image recognition.

The closed captioning gives us information on what the on screen characters are talking about, and most of the time, they are talking about what else is on screen. So, if the characters are talking about a book, it is likely that a book is in that video frame and others around it. If we collect thousands of video frames containing books, we should be able to isolate out what a "book" looks like. Then, if there is enough data there -- we should be able to work back from a frame, to see if there is a book in it.

Now, this isn't to say that all of this is trivial -- but it is interesting to think about what you can do if you have tons and tons of data.


posted Feb 8, 2005 | permanent link