Gesture2Music: A Low-Latency Real-Time Framework for Continuous Gesture-Driven Music Generation

Rathinaraja Jeyaraj; Barathi Subramanian; Kapilya Gangadharan; and Anand Paul

arXiv:2511.00793·cs.MM·April 29, 2026

Gesture2Music: A Low-Latency Real-Time Framework for Continuous Gesture-Driven Music Generation

Rathinaraja Jeyaraj, Barathi Subramanian, Kapilya Gangadharan, and Anand Paul

PDF

TL;DR

Gesture2Music introduces a real-time framework that converts live webcam gestures into continuous music, emphasizing low latency, temporal stability, and musical coherence through a novel synthetic data generation and prediction approach.

Contribution

It presents a low-latency streaming system using a causal TCN for gesture-driven music, with a synthetic data strategy and loss functions to enhance continuity and stability.

Findings

01

Achieves 30 ms inference latency in real-time performance.

02

Demonstrates stable, continuous music generation from live gestures.

03

Improves temporal continuity and reduces jitter in gesture-to-music mapping.

Abstract

Gesture-driven music generation is an emerging human-computer interaction paradigm for touch-free and expressive musical interaction. However, many existing approaches treat the task as isolated gesture classification or map gestures to symbolic outputs such as MIDI followed by a separate rendering stage, which limits temporal continuity and real-time responsiveness. This work presents Gesture2Music, a low-latency streaming framework for continuous gesture-driven music generation from live webcam feed. The system processes sequences of body and hand landmarks and uses a causal temporal convolutional network (TCN) to predict note-level musical control events, including pitch, octave, onset, sustain, amplitude, and activity state. Because available gesture-note datasets typically contain only isolated single-note recordings rather than continuous performance sequences, a synthetic stream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.