T3M: Text Guided 3D Human Motion Synthesis from Speech

Wenshuo Peng; Kaipeng Zhang; Sai Qian Zhang

arXiv:2408.12885·cs.CV·September 27, 2024

T3M: Text Guided 3D Human Motion Synthesis from Speech

Wenshuo Peng, Kaipeng Zhang, Sai Qian Zhang

PDF

1 Repo

TL;DR

T3M introduces a text-guided approach for 3D human motion synthesis from speech, enabling more accurate, diverse, and customizable animations compared to speech-only methods.

Contribution

The paper proposes T3M, a novel method that incorporates textual input to control 3D human motion synthesis, improving flexibility and performance over existing speech-driven approaches.

Findings

01

T3M outperforms state-of-the-art methods in quantitative metrics.

02

T3M provides more diverse and user-controlled motion synthesis.

03

The approach is validated through qualitative evaluations.

Abstract

Speech-driven 3D motion synthesis seeks to create lifelike animations based on human speech, with potential uses in virtual reality, gaming, and the film production. Existing approaches reply solely on speech audio for motion generation, leading to inaccurate and inflexible synthesis results. To mitigate this problem, we introduce a novel text-guided 3D human motion synthesis method, termed \textit{T3M}. Unlike traditional approaches, T3M allows precise control over motion synthesis via textual input, enhancing the degree of diversity and user customization. The experiment results demonstrate that T3M can greatly outperform the state-of-the-art methods in both quantitative metrics and qualitative evaluations. We have publicly released our code at \href{https://github.com/Gloria2tt/T3M.git}{https://github.com/Gloria2tt/T3M.git}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gloria2tt/t3m
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.