MIBURI: Towards Expressive Interactive Gesture Synthesis

M. Hamza Mughal; Rishabh Dabral; Vera Demberg; Christian Theobalt

arXiv:2603.03282·cs.CV·March 30, 2026

MIBURI: Towards Expressive Interactive Gesture Synthesis

M. Hamza Mughal, Rishabh Dabral, Vera Demberg, Christian Theobalt

PDF

2 Repos

TL;DR

MIBURI is a novel online framework that generates expressive, synchronized full-body gestures and facial expressions for ECAs in real-time, using hierarchical motion encoding and causal autoregressive modeling.

Contribution

It introduces the first real-time, causal system for expressive gesture synthesis conditioned on speech, combining hierarchical motion encoding with LLM-based context understanding.

Findings

01

Produces natural, contextually aligned gestures in real-time.

02

Outperforms recent baselines in naturalness and expressiveness.

03

Enables expressive gestures without long run-time dependencies.

Abstract

Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language model (LLM)-based conversational agents lack embodiment and the expressive gestures essential for natural interaction. Existing solutions for ECAs often produce rigid, low-diversity motions, that are unsuitable for human-like interaction. Alternatively, generative methods for co-speech gesture synthesis yield natural body gestures but depend on future speech context and require long run-times. To bridge this gap, we present MIBURI, the first online, causal framework for generating expressive full-body gestures and facial expressions synchronized with real-time spoken dialogue. We employ body-part aware gesture codecs that encode hierarchical motion details into multi-level discrete tokens. These tokens are then autoregressively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.