Monday, February 25, 2008

Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition (Westeyn, Brashear, Atrash, Starner)

Summary:

This paper describes a toolkit, GT^2k, to aid in gesture recognition via HMM, which makes use of another existing toolkit, HTK, that supports speech recognition. GT^2k allows for training models with both real-time and off-line recognition. They discuss four sample applications for which their toolkit has been applied. A gesture panel in automobiles recognizes simple gestures with camera data and gets 99.2% accuracy. A security system is built to have recognition of patterned blinking, to go with face recognition, so that the system can't be fooled with a still photograph of an authorized person, and it gets an 89.6% accuracy. TeleSign is a system that does mobile sign language recognition, but real-time recognition was not yet implemented, but they got 90% accuracy by combining vision and accelerometer data. A workshop activity recognition system is built to recognize actions made while constructing an object in a workshop, such as hammering or sawing, based on accelerometer data, and they found 93% accuracy, though again not in real-time.

Discussion:

The blinking idea is interesting, but I have a mental image of someone with holes cut in a photograph, maybe with fake eyelids attached that blink when a string is pulled. Plus, from an interface standpoint, it seems like most people who have already been trained by today's hand-operated devices would be more comfortable using their hands to input a code. Maybe an input pad hanging on a cord that the user can pull close and hide from people standing behind them would be a simpler solution.

It does seem like a valuable idea to have a toolkit that can be used in multiple gesture recognition applications, and anything that could improve accuracy could benefit a lot of people.

1 comment:

- D said...

I like the toolkit in general, but all their examples were horrible uses about the power of HMMs. They were all too simple and would probably be handled better with simpler algorithms.