Make-An-Animation: Large-Scale Text-conditional 3D Human Motion   Generation

Samaneh Azadi; Akbar Shah; Thomas Hayes; Devi Parikh; Sonal Gupta

arXiv:2305.09662·cs.CV·May 17, 2023·1 cites

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Samaneh Azadi, Akbar Shah, Thomas Hayes, Devi Parikh, Sonal Gupta

PDF

Open Access

TL;DR

Make-An-Animation is a novel large-scale, text-conditioned 3D human motion generation model that leverages image-text datasets and diffusion models to produce diverse, realistic motions aligned with input prompts.

Contribution

It introduces a two-stage training process using large-scale image-text data and motion capture data, enhancing diversity and realism in text-guided human motion generation.

Findings

01

Achieves state-of-the-art performance in text-to-motion generation.

02

Outperforms prior models in motion realism and prompt alignment.

03

Utilizes a U-Net architecture similar to text-to-video models.

Abstract

Text-guided human motion generation has drawn significant interest because of its impactful applications spanning animation and robotics. Recently, application of diffusion models for motion generation has enabled improvements in the quality of generated motions. However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on more diverse, in-the-wild prompts. In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. Make-An-Animation is trained in two stages. First, we train on a curated large-scale dataset of (text, static pseudo-pose) pairs extracted from image-text datasets. Second, we fine-tune on motion capture data, adding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Diffusion · Concatenated Skip Connection · U-Net