Designing, Playing, and Performing with a Vision-based Mouth Interface
Michael J. Lyons, Michael Haehnel, Nobuji Tetsutani

TL;DR
This paper introduces the Mouthesizer, a vision-based mouth interface that captures facial gestures via a headworn camera to control musical sound, demonstrating its application in live performances.
Contribution
It presents a novel system using computer vision for real-time mouth gesture recognition to control music, with practical applications in live performance settings.
Findings
Effective gesture-to-sound mappings demonstrated
Successful live performance using the Mouthesizer
System captures mouth shape parameters accurately
Abstract
The role of the face and mouth in speech production as well asnon-verbal communication suggests the use of facial action tocontrol musical sound. Here we document work on theMouthesizer, a system which uses a headworn miniaturecamera and computer vision algorithm to extract shapeparameters from the mouth opening and output these as MIDIcontrol changes. We report our experience with variousgesture-to-sound mappings and musical applications, anddescribe a live performance which used the Mouthesizerinterface.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Multisensory perception and integration
