Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation
Fengqi Liu, Hexiang Wang, Jingyu Gong, Ran Yi, Qianyu Zhou, Xuequan, Lu, Jiangbo Lu, Lizhuang Ma

TL;DR
This paper introduces a novel speech-driven gesture generation method that emphasizes semantic consistency of salient postures by learning a joint semantic space, detecting salient gestures, and focusing on high-level speech semantics, leading to improved synthesis quality.
Contribution
It proposes a joint manifold space for audio and pose representations, a weakly-supervised salient posture detector, and separate feature extraction for face and body gestures, advancing gesture synthesis.
Findings
Outperforms state-of-the-art methods in gesture quality.
Effectively captures salient postures aligned with speech semantics.
Enhances semantic consistency in gesture generation.
Abstract
Speech-driven gesture generation aims at synthesizing a gesture sequence synchronized with the input speech signal. Previous methods leverage neural networks to directly map a compact audio representation to the gesture sequence, ignoring the semantic association of different modalities and failing to deal with salient gestures. In this paper, we propose a novel speech-driven gesture generation method by emphasizing the semantic consistency of salient posture. Specifically, we first learn a joint manifold space for the individual representation of audio and body pose to exploit the inherent semantic association between two modalities, and propose to enforce semantic consistency via a consistency loss. Furthermore, we emphasize the semantic consistency of salient postures by introducing a weakly-supervised detector to identify salient postures, and reweighting the consistency loss to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Impairment and Communication · Speech and dialogue systems · Hand Gesture Recognition Systems
MethodsFocus
