Monday, February 25, 2008

Computer Vision-based Gesture Recognition for an Augmented Reality Interface (Storring, Moeslund, Liu, Granum)

This paper discusses wearable computing and augmented reality, which the authors believe should include gesture recognition as a way for the user to input commands to interact with the interface. Their focus here is on building towards a multi-user system that allows for a round-table meeting; since a table can be assumed and most of the expected gestures involve pointing, they believe that it is reasonable to restrict the gesture set to six gestures, these being a fist, and 1 to 5 fingers extended. Then all gestures are in a plane, and the recognition problem is reduced to 2D. Their system does low level pre-segmentation of the image of a hand, using pixel color (skin color) to find where the hand is, by changing the non-hand part of the image to black, and the hand shape to white. They look for the palm to be matched up to a circle and the fingers as rectangles centered at the circle, and they differentiate gestures by how many fingers are extended. They give no numerical results, but say that users adapted quickly to using the system and that the recognition rate was high enough that users found the gesture interface useful for the AR round-table architecture planning application.

Discussion:

I wonder how well their system does with fingers that are not held apart -- it would probably be possible to deal with recognizing multi-finger blobs when a human could infer that hey, that one "finger" is twice the width of two other fingers, or half as wide as the palm, so maybe it is really two fingers together. But it might be more of a pain to tweak the system to deal with those harder cases than to move to another recognition method -- and then of course there is the issue of moving to any more complicated gesture set that doesn't translate well to 2D.

Also, how did they find the thumb movement to be the most natural choice for a click gesture? Was that their design choice, or did they decide on it from user feedback?

1 comment:

- D said...

I really, /really/ like the use of the concentric circles and radius counting features. It's a lot like the radial bin histogram used by Oltmans (developed by Alvarado???) in his PhD thesis.

That's the only thing I liked about this paper.