VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS
Ming Meng, Ke Mu, Yonggui Zhu, Zhe Zhu, Haoyu Sun, Heyang Yan, Zhaoxin, Fan

TL;DR
VarGes introduces a novel framework that enhances co-speech 3D gesture generation by integrating visual style cues and audio features, resulting in more diverse and natural gestures in virtual human interactions.
Contribution
The paper presents VarGes, a new variation-driven approach combining style-reference video data and advanced encoding to improve gesture diversity and naturalness in co-speech gesture synthesis.
Findings
Outperforms existing methods in gesture diversity
Achieves higher naturalness in generated gestures
Validated on benchmark datasets
Abstract
Generating expressive and diverse human gestures from audio is crucial in fields like human-computer interaction, virtual reality, and animation. Though existing methods have achieved remarkable performance, they often exhibit limitations due to constrained dataset diversity and the restricted amount of information derived from audio inputs. To address these challenges, we present VarGes, a novel variation-driven framework designed to enhance co-speech gesture generation by integrating visual stylistic cues while maintaining naturalness. Our approach begins with the Variation-Enhanced Feature Extraction (VEFE) module, which seamlessly incorporates \textcolor{blue}{style-reference} video data into a 3D human pose estimation network to extract StyleCLIPS, thereby enriching the input with stylistic information. Subsequently, we employ the Variation-Compensation Style Encoder (VCSE), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Hand Gesture Recognition Systems
MethodsSoftmax · Attention Is All You Need · Tanh Activation
