Wednesday, February 6, 2008

Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation (Ip, Law, & Kwong)

Summary:

This work describes a system designed to allow both experienced musicians and novices to compose music by using hand gestures. The authors explain automated music generation in terms of musical theory, discussing how tonality, chord progression, closure of musical phrases (cadence), and generation of melody that follows the chords, and how all of these can be somewhat automated based on general rules for what makes a coherent piece of music. Then they describe their system architecture and implementation, which uses a pair of CyberGloves with Polhemus 3D position trackers; MIDI is used to synthesize musical sound; a music interface converts musical expressions to MIDI signals; background music is generated according to music theories and user-defined parameters of tempo and key; melody is generated according to hand signals, music theories and style template.

They describe the specific gesture mapping they chose for the system in depth, based on five guidelines: (1) Musical expressions should be intuitive; (2) those requiring fine control should be mapped to agile parts of the hands; (3) most important expressions should be easily triggered; (4) no two gestures should be too similar; (5) accidental triggering should be avoided. They map rhythym to the wrist flexion because it is very important but doesn't require fine movement. Pitch is important, so they map it to the relative height of the right hand, though it resets at a new bar of music. Pitch shifting of melody notes also occurs if the right hand is moved far enough relative to its position for the previous melody note. Dynamics (how strongly the note is played) and volume are controlled by right-hand finger flexion: extended fingers mean a stronger note. Lifting the left hand higher than the right hand adds a second instrument, which plays in unison or harmonizes two notes higher. Cadence occurs when the left-hand fingers completely bend, and keeping the hand closed stops the music.

The GUI lets the user choose an instrument, key, tonality, and tempo from drop-down menus (presumably with a mouse) before beginning composition with the CyberGloves.

Discussion:

Due to my lack of knowledge of composing music, I'm not sure I understood all of the automated music generation section, or if I did, it seems as though this could limit how much variety of music can be composed with this system. Then again, I could possibly be convinced that that working within the rules easily leaves more than enough flexibility to create interesting, original music, and it makes sense that it would be easier to create a system that could automate some things in order to allow the user to adjust the big picture with gestures. It does seem like their system might be difficult enough to learn to use without requiring the user to specify every detail of the music via hand movements, though maybe this could be alleviated with a sufficiently informative and usable GUI.

I think the part of the paper that is most likely to apply to other applications is their list of guidelines for determining gesture-to-meaning mapping, if we go to create our own gesture set. They seem kind of obvious, but in designing a system and writing a paper about it, it would be good to have a list of rules like that to compare our choices against.

1 comment:

Paul Taele said...

Heh, I chuckled a bit when your wrote about your lack of knowledge of composing music because of your sketch recognition work. :P

On your comment on the list of rules, that would have been really nice to have had. I wish they had done that and toned down or even remove their music theory section. It didn't feel like it justified their design choices for their gestures in their system.