On hand tracking & gesture recognition: CyberGlove

Showing posts with label CyberGlove. Show all posts

Wednesday, February 6, 2008

Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation (Ip, Law, & Kwong)

Summary:

This work describes a system designed to allow both experienced musicians and novices to compose music by using hand gestures. The authors explain automated music generation in terms of musical theory, discussing how tonality, chord progression, closure of musical phrases (cadence), and generation of melody that follows the chords, and how all of these can be somewhat automated based on general rules for what makes a coherent piece of music. Then they describe their system architecture and implementation, which uses a pair of CyberGloves with Polhemus 3D position trackers; MIDI is used to synthesize musical sound; a music interface converts musical expressions to MIDI signals; background music is generated according to music theories and user-defined parameters of tempo and key; melody is generated according to hand signals, music theories and style template.

They describe the specific gesture mapping they chose for the system in depth, based on five guidelines: (1) Musical expressions should be intuitive; (2) those requiring fine control should be mapped to agile parts of the hands; (3) most important expressions should be easily triggered; (4) no two gestures should be too similar; (5) accidental triggering should be avoided. They map rhythym to the wrist flexion because it is very important but doesn't require fine movement. Pitch is important, so they map it to the relative height of the right hand, though it resets at a new bar of music. Pitch shifting of melody notes also occurs if the right hand is moved far enough relative to its position for the previous melody note. Dynamics (how strongly the note is played) and volume are controlled by right-hand finger flexion: extended fingers mean a stronger note. Lifting the left hand higher than the right hand adds a second instrument, which plays in unison or harmonizes two notes higher. Cadence occurs when the left-hand fingers completely bend, and keeping the hand closed stops the music.

The GUI lets the user choose an instrument, key, tonality, and tempo from drop-down menus (presumably with a mouse) before beginning composition with the CyberGloves.

Discussion:

Due to my lack of knowledge of composing music, I'm not sure I understood all of the automated music generation section, or if I did, it seems as though this could limit how much variety of music can be composed with this system. Then again, I could possibly be convinced that that working within the rules easily leaves more than enough flexibility to create interesting, original music, and it makes sense that it would be easier to create a system that could automate some things in order to allow the user to adjust the big picture with gestures. It does seem like their system might be difficult enough to learn to use without requiring the user to specify every detail of the music via hand movements, though maybe this could be alleviated with a sufficiently informative and usable GUI.

I think the part of the paper that is most likely to apply to other applications is their list of guidelines for determining gesture-to-meaning mapping, if we go to create our own gesture set. They seem kind of obvious, but in designing a system and writing a paper about it, it would be good to have a list of rules like that to compare our choices against.

Wednesday, January 30, 2008

A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models (Chen, El-Sawah, Joslin & Georganas)

Summary:

This paper describes a system based on HMMs to do continuous dynamic gesture recognition, motivated by natural interaction in virtual environments. They review the major points of an HMM. They collect data from a CyberGlove and use three different dynamic gestures to control a cube's rotation. They use a multi-dimensional HMM and use the standard deviation of the angle variation for each finger joint as an alternative to requiring pauses in gesturing to split the data into meaningful pieces. They collected 10 data sets for each of the three gestures they wanted to recognize in order to train the HMMs. They have a 3D hand bone structure model to give extra feedback and show what the data from the glove looks like.

Discussion:

The difference between this paper and others we've recently read is that it deals with continuous gestures rather than requiring a single brief gesture at a time with a pause before the next. I find it hard to tell exactly what the gestures they have chosen are from the image, or a way to make sense of any intuitive meaning the gestures have in relation to the idea of a 3D cube rotating, though it did seem they only used a rotating cube to have some visualization of how the commands are being recognized.

The idea of a repetitive, continuous gesture is something we haven't considered very much so far. Is it useful to be able to break up a graph and look for repetition, like we do with overtraced circles and spirals? Are there many natural gestures that are repetitive and continuous like this? Waving to instruct somebody to move or be quiet might fall under this pattern, but what other things are there?

Online, Interactive Learning of Gestures for Human/Robot Interfaces (Lee & Xu)

This paper presents a system that can recognize gestures and learn new ones with one or two examples online, using HMMs. They base their idea on the procedure: (1) the user makes a series of gestures, (2) the system segments the data into separate gestures, then either reacts to the gesture if it recognizes it, or asks for clarification from the user, and (3) the system adds the new example to a list of examples it has seen and retrains the HMM on the data so far seen using the Baum-Welch algorithm. They chose to represent gestures by reducing the data to a one-dimensional sequence of symbols, after resampling at even intervals, dividing into time windows, and undergoing vector quantization. They generate a codebook for this using the LBG algorithm, offline. Their segmentation process requires that the hand be still for a short time between gestures, though they believe an acceleration threshold would be useful if the hand does not stop. They have a simple function to give them a confidence measure for each gesture's classification, and they tested the system on 14 letters of the sign language alphabet which they chose for not being ambiguous without hand orientation data. They found 1%-2.4% error after 2 examples and close to none after 4 or 6 examples in their two tests. Their future goals include increasing vocabulary size by using 5 dimensions of symbols (one per finger).

Discussion:

I am curious how natural pausing between gestures will be in how many applications. As we've discussed in class, applications like sign language might use a very fluid series of gestures. But in the case of some kinds of commands, pauses are probably very natural, unless you want to do a fast sequence of commands and not have to wait for confirmation of comprehension between them. I can imagine "corner finding" based on direction and speed could be another useful tool to segment gestures into more manageable pieces.

I think resampling at even intervals as in this paper will be a very good thing to keep in mind, along with jitter reduction.

Monday, January 28, 2008

An Architecture for Gesture-Based Control of Mobile Robots (Iba, Weghe, Paredis, Khosla)

This paper describes an approach for controlling mobile robots with hand gestures. The authors believe that capturing the intent of the user's 3D motions is a challenging but important goal for improving interaction between human and machine, and their gesture-based programming research is aimed at this long term goal. They have made a gesture spotting and recognition algorithm based on a HMM. Their system setup includes a mobile robot, a CyberGlove, a Polhemus 6DOF position sensor and a geolocation system to track the position and orientation of the robot. The robot can be directed to move by hand gestures: opening and closing are represented by going between flat hand and closed fist; pointing; and waving left or right. Closing slows and eventually stops the robot; opening maintains its current state; pointing makes the robot accelerate forwards (local control) or "go there" (global control); waving left or right makes the robot increase its rotational velocity in that direction (local) or go in the direction the hand is waving (global).

They pre-process the data to improve speed and performance by reducing the glove data from an 18-dimensional vector to a 10-dimensional feature vector, augment it with its derivatives, and maps the input vector to an integer codeword -- they chose to use 32 codewords. They partitioned a data set of 5000 measurements that cover the whole range of possible hand movements into 32 clusters, a centroid is calculated for each, and so at run time a feature vector can be mapped to a code word and a gesture can be treated as a sequence of codewords.

Discussion:

I think the most potentially useful part of this paper is the idea of reducing gestures to a sequence of codewords, since that would simplify the data we'd have to deal with by a lot. However, I wonder if they really got as thorough a sampling as they think they did, and if it allows for subtly different gestures. I don't buy that 32 codewords would be enough for all conceivable gestures, especially for something complex -- obviously ASL has 24 different static gestures that mean different things, and I'd bet we could find 9 more distinct gestures. Maybe increasing the number of codewords would help, but I'd still be wary.

I also think that while their system is a good step toward robot control using hand tracking, it isn't convincing that hand gestures are a good way to control robots doing important things that require much precision. I think I've played games where a character is controlled by using the up arrow to walk forward from the character's perspective and right and left keys to turn the character to his left or right, and that is hard enough to control without the input being interpreted as adding velocity rather than just turning or moving at a fixed speed. As for the global controls, pointing is imprecise, especially at a distance. Think about trying to have people point out a constellation to you: their arm is neither aligned with their line of sight nor yours. Waving seems even harder to be precise about. For circumstances that require accuracy in the robot's movement, I think a different set of controls would be necessary, though it might still be possible to do it using a glove or hand tracking.

Also of note, the 8th reference in this paper refers to public domain HMM C++ code by Myers and Whitson, which might be nice to look at if we need HMM code.

On hand tracking & gesture recognition