LiveGesture Streamable Co-Speech Gesture Generation Model

Muhammad Usama Saleem; Mayur Jagdishbhai Patel; Ekkasit Pinyoanuntapong; Zhongxing Qin; Li Yang; Hongfei Xue; Ahmed Helmy; Chen Chen; Pu Wang

arXiv:2604.10927·cs.CV·April 14, 2026

LiveGesture Streamable Co-Speech Gesture Generation Model

Muhammad Usama Saleem, Mayur Jagdishbhai Patel, Ekkasit Pinyoanuntapong, Zhongxing Qin, Li Yang, Hongfei Xue, Ahmed Helmy, Chen Chen, Pu Wang

PDF

TL;DR

LiveGesture is a novel real-time, streamable model for generating full-body co-speech gestures from audio, operating with zero look-ahead and supporting arbitrary sequence lengths.

Contribution

It introduces a fully streamable, causal gesture generation framework with region-coordinated motion modeling and robustness enhancements for real-time applications.

Findings

01

Produces coherent, diverse, beat-synchronous gestures in real time.

02

Outperforms or matches state-of-the-art offline methods under zero look-ahead.

03

Supports arbitrary sequence length with causal, region-coordinated modeling.

Abstract

We propose LiveGesture, the first fully streamable, speech-driven full-body gesture generation framework that operates with zero look-ahead and supports arbitrary sequence length. Unlike existing co-speech gesture methods, which are designed for offline generation and either treat body regions independently or entangle all joints within a single model, LiveGesture is built from the ground up for causal, region-coordinated motion generation. LiveGesture consists of two main modules: the Streamable Vector Quantized Motion Tokenizer (SVQ) and the Hierarchical Autoregressive Transformer (HAR). The SVQ tokenizer converts the motion sequence of each body region into causal, discrete motion tokens, enabling real-time, streamable token decoding. On top of SVQ, HAR employs region-expert autoregressive (xAR) transformers to model expressive, fine-grained motion dynamics for each body region. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.