PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers

Xiangyue Zhang; Yiyi Cai; Kunhang Li; Kaixing Yang; You Zhou; Zhengqing Li; Xuangeng Chu; Jiaxu Zhang; Haiyang Liu

arXiv:2605.06064·cs.CV·May 8, 2026

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers

Xiangyue Zhang, Yiyi Cai, Kunhang Li, Kaixing Yang, You Zhou, Zhengqing Li, Xuangeng Chu, Jiaxu Zhang, Haiyang Liu

PDF

1 Repo

TL;DR

PersonaGesture introduces a diffusion-based method for personalized co-speech gesture synthesis for unseen speakers using only a single reference clip, improving style retention without per-speaker training.

Contribution

The paper presents a novel diffusion pipeline with Adaptive Style Infusion and Implicit Distribution Rectification for effective unseen speaker gesture personalization from minimal reference data.

Findings

01

Outperforms existing methods in unseen speaker personalization metrics.

02

Effectively separates speaker style from utterance-specific gestures.

03

Achieves high human preference scores in qualitative evaluations.

Abstract

We propose PersonaGesture, a diffusion-based pipeline for single-reference co-speech gesture personalization of unseen speakers. Given target speech and one motion clip from a new speaker, the model must synthesize gestures that follow the new utterance while retaining speaker-specific pose choices, without per-speaker optimization. This setting is useful for avatars and virtual agents, but it is hard because the reference mixes stable speaker habits with utterance-specific trajectories. PersonaGesture consists of two key components, Adaptive Style Infusion (ASI) and Implicit Distribution Rectification (IDR), to separate temporal identity evidence from residual statistic correction. A Style Perceiver first encodes the variable-length reference into compact speaker-memory tokens. ASI injects these tokens into denoising through zero-initialized residual cross-attention, enabling style…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://xiangyue-zhang.github.io/PersonaGesture
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.