MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal   Controls

Yuxuan Bian; Ailing Zeng; Xuan Ju; Xian Liu; Zhaoyang Zhang; Wei Liu,; Qiang Xu

arXiv:2407.21136·cs.CV·August 27, 2024

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

Yuxuan Bian, Ailing Zeng, Xuan Ju, Xian Liu, Zhaoyang Zhang, Wei Liu,, Qiang Xu

PDF

Open Access 1 Repo

TL;DR

MotionCraft introduces a unified diffusion transformer for multimodal whole-body motion generation, effectively handling diverse control modalities and motion formats through a coarse-to-fine training strategy and novel graph modeling.

Contribution

The paper presents MotionCraft, a novel framework with plug-and-play multimodal control, a two-stage training process, and a new benchmark, MC-Bench, for improved multimodal motion generation.

Findings

01

Achieves state-of-the-art results on multiple motion generation tasks

02

Effectively models static and dynamic human topology graphs

03

Addresses motion format inconsistency with the new MC-Bench benchmark

Abstract

Whole-body multimodal motion generation, controlled by text, speech, or music, has numerous applications including video generation and character animation. However, employing a unified model to achieve various generation tasks with different condition modalities presents two main challenges: motion distribution drifts across different tasks (e.g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e.g., text and audio). Additionally, inconsistent motion formats across different tasks and datasets hinder effective training toward multimodal motion generation. In this paper, we propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control. Our framework employs a coarse-to-fine training strategy, starting with the first stage of text-to-motion semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cure-lab/MotionCraft
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Hand Gesture Recognition Systems

MethodsDiffusion