Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning
Xin Guo, Yifan Zhao, Jia Li

TL;DR
This paper introduces a hierarchical implicit periodicity learning method for generating realistic 3D co-speech gestures by modeling inter- and intra-correlations across different motion units, improving naturalness and coordination.
Contribution
It proposes a novel hierarchical implicit periodicity learning approach that captures intrinsic correlations in gesture movements, outperforming existing methods in co-speech gesture generation.
Findings
Outperforms state-of-the-art methods in quantitative evaluations.
Produces more natural and coordinated gestures.
Demonstrates effectiveness on 3D avatar datasets.
Abstract
Generating 3D-based body movements from speech shows great potential in extensive downstream applications, while it still suffers challenges in imitating realistic human movements. Predominant research efforts focus on end-to-end generation schemes to generate co-speech gestures, spanning GANs, VQ-VAE, and recent diffusion models. As an ill-posed problem, in this paper, we argue that these prevailing learning schemes fail to model crucial inter- and intra-correlations across different motion units, i.e. head, body, and hands, thus leading to unnatural movements and poor coordination. To delve into these intrinsic correlations, we propose a unified Hierarchical Implicit Periodicity (HIP) learning approach for audio-inspired 3D gesture generation. Different from predominant research, our approach models this multi-modal implicit relationship by two explicit technique insights: i) To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Face recognition and analysis · Speech and Audio Processing
