Emphasizing Semantic Consistency of Salient Posture for Speech-Driven   Gesture Generation

Fengqi Liu; Hexiang Wang; Jingyu Gong; Ran Yi; Qianyu Zhou; Xuequan; Lu; Jiangbo Lu; Lizhuang Ma

arXiv:2410.13786·cs.CV·October 18, 2024

Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

Fengqi Liu, Hexiang Wang, Jingyu Gong, Ran Yi, Qianyu Zhou, Xuequan, Lu, Jiangbo Lu, Lizhuang Ma

PDF

Open Access

TL;DR

This paper introduces a novel speech-driven gesture generation method that emphasizes semantic consistency of salient postures by learning a joint semantic space, detecting salient gestures, and focusing on high-level speech semantics, leading to improved synthesis quality.

Contribution

It proposes a joint manifold space for audio and pose representations, a weakly-supervised salient posture detector, and separate feature extraction for face and body gestures, advancing gesture synthesis.

Findings

01

Outperforms state-of-the-art methods in gesture quality.

02

Effectively captures salient postures aligned with speech semantics.

03

Enhances semantic consistency in gesture generation.

Abstract

Speech-driven gesture generation aims at synthesizing a gesture sequence synchronized with the input speech signal. Previous methods leverage neural networks to directly map a compact audio representation to the gesture sequence, ignoring the semantic association of different modalities and failing to deal with salient gestures. In this paper, we propose a novel speech-driven gesture generation method by emphasizing the semantic consistency of salient posture. Specifically, we first learn a joint manifold space for the individual representation of audio and body pose to exploit the inherent semantic association between two modalities, and propose to enforce semantic consistency via a consistency loss. Furthermore, we emphasize the semantic consistency of salient postures by introducing a weakly-supervised detector to identify salient postures, and reweighting the consistency loss to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Impairment and Communication · Speech and dialogue systems · Hand Gesture Recognition Systems

MethodsFocus