Monday, January 28, 2008

An Architecture for Gesture-Based Control of Mobile Robots (Iba, Weghe, Paredis, Khosla)

This paper describes an approach for controlling mobile robots with hand gestures. The authors believe that capturing the intent of the user's 3D motions is a challenging but important goal for improving interaction between human and machine, and their gesture-based programming research is aimed at this long term goal. They have made a gesture spotting and recognition algorithm based on a HMM. Their system setup includes a mobile robot, a CyberGlove, a Polhemus 6DOF position sensor and a geolocation system to track the position and orientation of the robot. The robot can be directed to move by hand gestures: opening and closing are represented by going between flat hand and closed fist; pointing; and waving left or right. Closing slows and eventually stops the robot; opening maintains its current state; pointing makes the robot accelerate forwards (local control) or "go there" (global control); waving left or right makes the robot increase its rotational velocity in that direction (local) or go in the direction the hand is waving (global).

They pre-process the data to improve speed and performance by reducing the glove data from an 18-dimensional vector to a 10-dimensional feature vector, augment it with its derivatives, and maps the input vector to an integer codeword -- they chose to use 32 codewords. They partitioned a data set of 5000 measurements that cover the whole range of possible hand movements into 32 clusters, a centroid is calculated for each, and so at run time a feature vector can be mapped to a code word and a gesture can be treated as a sequence of codewords.

Discussion:

I think the most potentially useful part of this paper is the idea of reducing gestures to a sequence of codewords, since that would simplify the data we'd have to deal with by a lot. However, I wonder if they really got as thorough a sampling as they think they did, and if it allows for subtly different gestures. I don't buy that 32 codewords would be enough for all conceivable gestures, especially for something complex -- obviously ASL has 24 different static gestures that mean different things, and I'd bet we could find 9 more distinct gestures. Maybe increasing the number of codewords would help, but I'd still be wary.

I also think that while their system is a good step toward robot control using hand tracking, it isn't convincing that hand gestures are a good way to control robots doing important things that require much precision. I think I've played games where a character is controlled by using the up arrow to walk forward from the character's perspective and right and left keys to turn the character to his left or right, and that is hard enough to control without the input being interpreted as adding velocity rather than just turning or moving at a fixed speed. As for the global controls, pointing is imprecise, especially at a distance. Think about trying to have people point out a constellation to you: their arm is neither aligned with their line of sight nor yours. Waving seems even harder to be precise about. For circumstances that require accuracy in the robot's movement, I think a different set of controls would be necessary, though it might still be possible to do it using a glove or hand tracking.

Also of note, the 8th reference in this paper refers to public domain HMM C++ code by Myers and Whitson, which might be nice to look at if we need HMM code.

2 comments:

Brandon said...

reducing gestures into codewords through vector quantization is quite common with HMMs. essentially HMMs are best at handling single inputs (i.e. single observations like a coin flip, a selected color ball, and other toy examples mentioned in the Rabiner paper). when you have a multiple input system, it is quite common to use some form of clustering to turn multiple observations into a single observation. this is yet another downfall of HMMs according to some people because this is an additional step that adds error to classification. is the error from vector quantization or is it from the actual HMM model? could error be prevented from adding more clusters to vector quantization? what about more states to the HMM? there are just too many variables and too much guess and check.

Paul Taele said...

You made a good point concerning hand tracking with robot control. In terms of precision, that would not be the best way to go, especially for a single robot. For that situation, one is better off just using a regular remote controller. The application would make more sense for a group of mobile robots with at least somewhat autonomous capabilities. With hand tracking, the person can perform commander-like functionality and not micromanage multiple robots. In theory, that would be the kind of feasibility this paper would be capable of. In practice, well...the paper has some more work to do.