The Future of Image and Video Searching

In case you missed it, you should definitely check out the entire New York Times Magazine section for November 23. It was a special "screens" issue, devoted to "how we watch stuff" -- from iPod nanos to flat-screen TVs to IMAX theaters. All the articles are worth reading. But the one that was the most though-provoking for me was Becoming Screen Literate (by Kevin Kelly).

Among the many fascinating ideas raised in this article was an analogy between how we create and search for text vs. how we create and search for graphics or video.

When writing any text (from this brief column to the great American novel), we start with the same alphabet and the same collection of words. Neither the letters nor the words are "original." There are only 26 letters -- and all the words can be found in a dictionary. What makes any writing creative or unique is how these elements are combined.

When creating video, there is not yet a similar alphabet or dictionary to draw from. If you want, for example, to include an elephant in your home video, you are generally expected to go out and find an elephant to film. If you instead copied a clip of an elephant from someone else's movie, in the same way you select a word from a dictionary, it would generally be regarded as a copyright violation.

However, as the Times article goes on to point out, this is changing. It is becoming increasingly acceptable to use images and video available on the Web, to combine them in a unique way for your own creative effort. As an example, the author cites the popularity of mashups: "In fact, the habits of the mashup are borrowed from textual literacy. You cut and paste words on a page. You quote verbatim from an expert. You paraphrase a lovely expression. You add a layer of detail found elsewhere. You borrow the structure from one work to use as your own. You move frames around as if they were phrases."

What is needed to make this task easier is a better way to find these images, the equivalent of a searchable dictionary for video. You can already search for images and videos on the Web, using tools such as Google. But the author envisions something more: "The holy grail of visuality is to search the library of all movies the way Google can search the Web. Everyone is waiting for a tool that would allow them to type key terms, say “bicycle + dog,” which would retrieve scenes in any film featuring a dog and a bicycle...Google can instantly pinpoint desirable documents out of billions on the Web because computers can read text, but computers are only starting to learn how to read images."

This type of search would require that the tool be able to translate the word dog, for example, into an image of dog, allowing it to search for this image in all movies, even if there were no text tags to indicate that a dog was in a given scene. A big task!

However, this started me thinking about smaller scale versions of this task, ones that could benefit me when using my Mac. For example, consider iPhoto.

I now have over 10,000 photos in my iPhoto collection. Unfortunately, until just a few months ago, I never entered tags for these photos. The lack of tags for most of my collection became an issue recently when I wanted to search for a photo of myself and my wife. I had grouped photos into events, but I had could not recall what event had the photo I was seeking. What I wanted was a way to say "Search for all photos that have both Ted and Naomi in them." Unfortunately, without tags for "Ted" and "Naomi," there was no way to do this.

What would be really great would be if I could find a photo of myself, use the cursor to draw a circle around my face, and then say to iPhoto: "Find all photos that have the selected face in it." That would be a true video search analogous to the text keyword searches we have now. And it might just save me the hassle of having to go back and tag all my photos.

Extending this idea, I'd like to be able to save these search images in a sort of "graphic dictionary." On future occasions, I could simply select the image I was seeking from this dictionary, avoiding the need to find a photo of myself as a starting point.

I expect we are still years off from being able to do all of this. But it's coming.

While you're waiting, check out another article in this same issue of the Times magazine. This one explores how Netflix determines, from your movie ratings, what other movies to recommend to you. It's a harder task than you might imagine. In fact, it's so difficult that, if you can improve on Netflix's success rate by 10%, Netflix will pay you a million dollars! Time for me to get to work!