Summary:
This short paper describes a system built to recognize finger-spelled letters in ASL, motivated by helping integrate deaf communities into mainstream society, especially for those who are better at ASL than reading and writing English. They used a cyberglove to capture hand position data -- it's not clear to me if they have a stream of data or if they just take a snapshot of the glove's positions at one point in time. They left out J and Z because those letters include motion in 3D space. They used Matlab with a Neural Networks toolbox for letter recognition. They found a 90% accuracy on the 24 letters they used, but only for the person whose data was used to train the neural network.
Discussion:
This is an interesting paper, but as the authors say, it is only a beginning step. Being able to recognize 24 letters of the alphabet is nice, but it isn't nearly sufficient for any kind of normal-speed conversation. Would you ask a hearing person to spell out their sentences instead of speaking in words? Setting aside the issues of natural language processing and assuming there is a distinct gesture for every word, and supposing our only goal is to recognize words so as to translate them directly, there remains a huge number of gestures to be learned in order to facilitate comfortable, natural conversation. The 90% accuracy rate given in the paper for the alphabet is obviously going to go down as the search space gets larger. Based on this paper alone, sign language is not a solved problem -- I'd expect it to be a pretty hard problem.
Additionally, it might be important to see if wearing a glove affects hand movements, in case anyone ever tries to apply glove data to training vision-based recognition systems.
Subscribe to:
Post Comments (Atom)
4 comments:
Finger spelling should be an easy first step, like handwriting recognition and single characters. A good system should be able to get at least 95-99%, especially since there's no 'cursive' vs 'print' finger spelling, just 26 unique signs.
As for recognizing the natural language of ASL, it wouldn't be as hard as full-blown English. There are a lot of stop words that aren't present in ASL (the, and, a) since it's based on French grammar. You'd have to use a hidden Markov model (or other temporal model) with a streaming input of finger and position data to classify complete signs. I would compare this problem to natural speech recognition, in complexity and scope.
sign language is a hard problem (even just recognizing single characters). the biggest downfall in my opinion though is that the glove can't register the hands position in x/y/z space. therefore, it would be impossible to distinguish an 'I' from a 'J'. in order to recognize all the single characters we have to integrate the glove with some kind of tracker (or someone needs to come up with some brilliant of using the glove alone).
I agree with Brandon in that the largest issue with trying to create a sign-language interpreter is that the hand-tracking system needs to be in a controlled environment. This equipment might be set up in an auditorium, but the cost for the equipment and software would likely outweigh the few times a sign language interpreter is needed (for most places, that is).
I think, instead of 3D position, accelerometers would provide a little bit of data to distinguish between Js and Zs. Although "exact" 3D position is sacrificed, mobility and usability is gained.
Yes, there are certainly systems with higher accuracy out there. But we still haven't seen any systems that record two Deaf people talking to each other and attempt to recognized these gestures, using a human judge. This would leave the problem of natural gestures (which may include informality or laziness in a gesture creation) as well as the segmentation problem. I would expect the rates would go down tremendously in such a system.
Post a Comment