Gesture2Music: A Low-Latency Real-Time Framework for Continuous Gesture-Driven Music Generation
Rathinaraja Jeyaraj, Barathi Subramanian, Kapilya Gangadharan, and Anand Paul

TL;DR
Gesture2Music introduces a real-time framework that converts live webcam gestures into continuous music, emphasizing low latency, temporal stability, and musical coherence through a novel synthetic data generation and prediction approach.
Contribution
It presents a low-latency streaming system using a causal TCN for gesture-driven music, with a synthetic data strategy and loss functions to enhance continuity and stability.
Findings
Achieves 30 ms inference latency in real-time performance.
Demonstrates stable, continuous music generation from live gestures.
Improves temporal continuity and reduces jitter in gesture-to-music mapping.
Abstract
Gesture-driven music generation is an emerging human-computer interaction paradigm for touch-free and expressive musical interaction. However, many existing approaches treat the task as isolated gesture classification or map gestures to symbolic outputs such as MIDI followed by a separate rendering stage, which limits temporal continuity and real-time responsiveness. This work presents Gesture2Music, a low-latency streaming framework for continuous gesture-driven music generation from live webcam feed. The system processes sequences of body and hand landmarks and uses a causal temporal convolutional network (TCN) to predict note-level musical control events, including pitch, octave, onset, sustain, amplitude, and activity state. Because available gesture-note datasets typically contain only isolated single-note recordings rather than continuous performance sequences, a synthetic stream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
