Thursday, February 21, 2008

Television Control by Hand Gestures (Freeman, Weissman)

Summary:

This paper deals with studying how a person can control a TV set via hand gestures, focusing on two issues: (1) how can there be a large set of commands without requiring the users to learn difficult-to-remember gestures, and (2) how can the computer recognize commands in a complex vision-based setting? They chose gestures as the method of input because they anticipated problems with voice fatigue and awkwardness in incremental changes of parameters (like volume). Their approach is to have the user hold out an open hand to trigger a control mode, wherein a hand icon is displayed on the screen whose movements mirror the user's hand movements, and then there are command icons activated by mouseover and sliders which the hand icon can manipulate for things like volume. They use normalized correlation/template matching for recognition. They do not process objects that are stationary for some time. They created a prototype and found that users were excited about using gestures to control the TV, but weren't sure if this was just due to novelty.

Discussion:

The idea of giving the TV a camera with which to watch the viewer seems both interesting and unsettling. Maybe if combined with recognition of what the user is doing, this sort of technology could adjust the output of the television, adjusting volume down if the user is focused on reading material or a phone conversation, or up if the user is leaning forward with an intent expression as if trying to hear. I'm not sure I would want such technology in my home at all, because I think there is something to be said for predictable manual control and a sense of privacy. Maybe if the user can be convinced that the system only allows real-time analysis of motion and is not being recorded somewhere, they will be more likely to accept the technology, and perhaps having similar monitoring around the home could be useful for detecting injuries and illnesses that leave a person incapacitated and needing help. Someone falls to the floor, a voice prompt asks, "Are you all right?", and without a verbal response, the system calls for outside help.

I like the idea of having a hand icon appear on the display when the interactive mode is initiated by the user holding up a hand. This visual feedback, in sync with the user's motions, should help make the system easier and less frustrating to use. It might become annoying if people are watching TV together, making conversation about the show with hand gestures, and trigger an extensive display, so maybe the hand icon could be unobtrusive for a few seconds, maybe only a couple of sliders would be shown or other gestures could be allowed as "quick commands", and then instructions pop up for commands after a couple seconds of no significant change in input. I'm not sure if this would be better or not. Nor that a camera is a better solution in the first place for remote control loss than gluing a string to your remote and attaching it somewhere near where you watch tv, or such.

2 comments:

Grandmaster Mash said...

I think your last sentence hit on the problem with the paper as a practical application. Yet, I do think the idea of trying to even remove the need for a remote is intriguing. Fading in and out the hand icon might make it less obtrusive, but there's still the issue of "Is it better than the current solution?" In this case, no.

Paul Taele said...

I do not think the authors of this paper thought their cunning plan through. :P Good point on the part of activating the system by accident. Fortunately, the consequences aren't dire for this type of application, but the idea behind this paper was both ahead of its time, and like what Aaron said, doesn't really improve on existing methods.