Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
Samaneh Azadi, Akbar Shah, Thomas Hayes, Devi Parikh, Sonal Gupta

TL;DR
Make-An-Animation is a novel large-scale, text-conditioned 3D human motion generation model that leverages image-text datasets and diffusion models to produce diverse, realistic motions aligned with input prompts.
Contribution
It introduces a two-stage training process using large-scale image-text data and motion capture data, enhancing diversity and realism in text-guided human motion generation.
Findings
Achieves state-of-the-art performance in text-to-motion generation.
Outperforms prior models in motion realism and prompt alignment.
Utilizes a U-Net architecture similar to text-to-video models.
Abstract
Text-guided human motion generation has drawn significant interest because of its impactful applications spanning animation and robotics. Recently, application of diffusion models for motion generation has enabled improvements in the quality of generated motions. However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on more diverse, in-the-wild prompts. In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. Make-An-Animation is trained in two stages. First, we train on a curated large-scale dataset of (text, static pseudo-pose) pairs extracted from image-text datasets. Second, we fine-tune on motion capture data, adding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Diffusion · Concatenated Skip Connection · U-Net
