Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion   Generation

Bohong Chen; Yumeng Li; Yao-Xiang Ding; Tianjia Shao; Kun Zhou

arXiv:2410.00464·cs.CV·October 2, 2024

Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation

Bohong Chen, Yumeng Li, Yao-Xiang Ding, Tianjia Shao, Kun Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces SynTalker, a novel method for full-body co-speech motion generation that enables precise control of body movements based on speech and prompts, overcoming dataset limitations.

Contribution

The paper presents a multi-stage training and diffusion-based inference approach to achieve synergistic full-body motion control from text prompts and speech.

Findings

01

Supports precise full-body motion control

02

Handles diverse human activities beyond training data

03

Outperforms existing methods in flexibility and accuracy

Abstract

Current co-speech motion generation approaches usually focus on upper body gestures following speech contents only, while lacking supporting the elaborate control of synergistic full-body motion based on text prompts, such as talking while walking. The major challenges lie in 1) the existing speech-to-motion datasets only involve highly limited full-body motions, making a wide range of common human activities out of training distribution; 2) these datasets also lack annotated user prompts. To address these challenges, we propose SynTalker, which utilizes the off-the-shelf text-to-motion dataset as an auxiliary for supplementing the missing full-body motion and prompts. The core technical contributions are two-fold. One is the multi-stage training process which obtains an aligned embedding space of motion, speech, and prompts despite the significant distributional mismatch in motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RobinWitch/SynTalker
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Social Robot Interaction and HRI · Robotics and Automated Systems

MethodsFocus