Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi, Yatian Wang, Hengyuan Zhang, Jiahao Pan, Wei Xue,, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo

TL;DR
This paper introduces Co$^{3}$Gesture, a novel framework for generating coherent two-person co-speech gestures using a large-scale dataset and interactive diffusion techniques, advancing virtual avatar animation in interactive settings.
Contribution
The paper presents a new large-scale dataset GES-Inter and a novel Co$^{3}$Gesture framework with a Temporal Interaction Module and mutual attention for improved co-speech gesture synthesis.
Findings
Outperforms state-of-the-art models on GES-Inter dataset
Effectively models interaction dynamics between two speakers
Generates vivid, coherent two-person gestures
Abstract
Generating gestures from human speech has gained tremendous progress in animating virtual avatars. While the existing methods enable synthesizing gestures cooperated by individual self-talking, they overlook the practicality of concurrent gesture modeling with two-person interactive conversations. Moreover, the lack of high-quality datasets with concurrent co-speech gestures also limits handling this issue. To fulfill this goal, we first construct a large-scale concurrent co-speech gesture dataset that contains more than 7M frames for diverse two-person interactive posture sequences, dubbed GES-Inter. Additionally, we propose CoGesture, a novel framework that enables coherent concurrent co-speech gesture synthesis including two-person interactive movements. Considering the asymmetric body dynamics of two speakers, our framework is built upon two cooperative generation branches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Motion and Animation · Human Pose and Action Recognition
MethodsSoftmax · Attention Is All You Need
