T2M-X: Learning Expressive Text-to-Motion Generation from Partially   Annotated Data

Mingdian Liu; Yilin Liu; Gurunandan Krishnan; Karl S Bayer; Bing Zhou

arXiv:2409.13251·cs.CV·September 23, 2024

T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data

Mingdian Liu, Yilin Liu, Gurunandan Krishnan, Karl S Bayer, Bing Zhou

PDF

Open Access

TL;DR

T2M-X is a novel two-stage approach that generates expressive, whole-body humanoid animations from text prompts by learning from partially annotated data, improving motion quality and coordination.

Contribution

It introduces a multi-part VQ-VAE and GPT-based framework for comprehensive text-to-motion generation from limited annotations, addressing dataset inconsistency issues.

Findings

01

Significant quantitative improvements over baselines.

02

High-quality, expressive whole-body motion outputs.

03

Robustness against dataset limitations.

Abstract

The generation of humanoid animation from text prompts can profoundly impact animation production and AR/VR experiences. However, existing methods only generate body motion data, excluding facial expressions and hand movements. This limitation, primarily due to a lack of a comprehensive whole-body motion dataset, inhibits their readiness for production use. Recent attempts to create such a dataset have resulted in either motion inconsistency among different body parts in the artificially augmented data or lower quality in the data extracted from RGB videos. In this work, we propose T2M-X, a two-stage method that learns expressive text-to-motion generation from partially annotated data. T2M-X trains three separate Vector Quantized Variational AutoEncoders (VQ-VAEs) for body, hand, and face on respective high-quality data sources to ensure high-quality motion outputs, and a Multi-indexing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Natural Language Processing Techniques · Multimodal Machine Learning Applications