Learning Hierarchical Cross-Modal Association for Co-Speech Gesture   Generation

Xian Liu; Qianyi Wu; Hang Zhou; Yinghao Xu; Rui Qian; Xinyi Lin,; Xiaowei Zhou; Wayne Wu; Bo Dai; Bolei Zhou

arXiv:2203.13161·cs.CV·March 25, 2022·6 cites

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin,, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces HA2G, a hierarchical framework for generating realistic co-speech gestures by leveraging multi-granularity audio representations and hierarchical pose inference, outperforming previous methods.

Contribution

The paper proposes a novel hierarchical framework that captures multi-granularity speech semantics and generates detailed co-speech gestures, improving over holistic approaches.

Findings

01

Outperforms previous methods in gesture realism

02

Effective multi-granularity audio representation extraction

03

Human evaluation confirms improved gesture naturalness

Abstract

Generating speech-consistent body and gesture movements is a long-standing problem in virtual avatar creation. Previous studies often synthesize pose movement in a holistic manner, where poses of all joints are generated simultaneously. Such a straightforward pipeline fails to generate fine-grained co-speech gestures. One observation is that the hierarchical semantics in speech and the hierarchical structures of human gestures can be naturally described into multiple granularities and associated together. To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. In HA2G, a Hierarchical Audio Learner extracts audio representations across semantic granularities. A Hierarchical Pose Inferer subsequently renders the entire human pose gradually in a hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alvinliu0/HA2G
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Human Motion and Animation

MethodsContrastive Learning