If you have been keeping up with my posts on this blog you won't be surprised to learn that today I spent my lunch hour exploring a video search offering that's new to me called VideoSurf. I was so interested in this new search tool that I interrupted my usual run of image indexing articles, and my lunch hour, to do some research and write up this post.
In a September press release VideoSurf claimed its computers can now, "see inside videos to understand and analyze the content." I would encourage anyone who has an interest in this area to take a look at the company's website, give it a whirl and see what they think.
In my experiences video search engines have relied on a combination of the metadata that is linked to the video clips, scene and key frame analysis, and automatic indexing of sound tracks synched with the video.
For example, sound tracks, synchronised to video content, can be transformed to text and indexed and then can be linked to sections of videos by looking for gaps in the video to identify scenes, with various techniques also used to create key frames, that attempt to represent a scene. These techniques are backed up with metadata to accompany a video clip.
If you have worked in the industry you know that video metadata is expensive to create. Most of what people see online is either harvested for free from other sources, or limited in size and scope. Such metadata may cover the title of a video clip, text describing the clip, clip length .etc. It may even include some information about the depicted content in the video or even abstract concepts which try to specify what a clip is about. Though this level of video metadata is the most time consuming and complex to create - it also offers the fullest level of access for users.
Audio tracks can be also be of great use and many information needs can be met by searching on audio in a video. There are however limitations; for example many VERY SCARY scenes have little dialogue in them, and depend heavily on camera-work and music to give the feeling of fear, how easy is it to find these scenes based on dialogue alone, or even based on 'seeing inside a video'. How can you look for 'fear' as a concept?
Content based image retrieval, looking at textures, basic shapes, and colours in still images, has yet to offer the promised revolution in image indexing and retrieval. In some contexts it works quite well, in many contexts end-users don't really see how it works at all. So adding a layer to video search that tries to analyse the actual content, pixel for pixel is an interesting development.
To my mind, a full set of access paths to all the layers of a video still demands the use of fairly extensive metadata, especially for depicted content and abstract concepts. Up to now, metadata has always been the way to find what an image, whether it's still or moving, is conceptually about, and what can be seen in individual images and videos. Even when that metadata is actually sounds, turned into text and stored in a database.
Is VideoSurf's offering really any different from what's gone before?
Is this system, which seems to be using Content-Based Image Retrieval (CBIR technology to some extent, a significant advance?
Reviewing some of the blog posts people have published it seems many others are interested in VideoSurf's offering as well.
For an initial idea as to how VideoSurf works, try taking a look at James McQuivey's OmniVideo blog post, "Video search, are we there yet?-. As James describes in the article, one pretty neat aspect of what VideoSurf can do is to match faces, enabling you to look for the same face in different videos, thus reducing the need to have the depicted person mentioned in the metadata exclusively. However, this clearly isn't much help if the person you're looking for is mentioned but not depicted, in which case indexed audio would help, or if the person is not well depicted, for example the person is only depicted from the side or the back. However, quibbles aside, if this works, then this is a pretty useful function in itself.
Here are some of the other bloggers who have be writing their thoughts on Video Surf. For example:
Clearly, we're on the right track and there is a lot of interest in the opportunities and technologies around video search. However I think that there is a long way to go before detailed and automatic object recognition is of any meaningful use to people. As far as I can see, it's still not there with still or moving digital images. Metadata for me is still the 'king' of visual search. There however are a growing number of needs that automatic solutions can already resolve and a growing case for solutions that work by offering a combination of automatic computer recognition of image elements, metadata schemes and controlled vocabulary search and browse support.
I'd love to know what people think, about VideoSurf and other services that provide video search.