Text-driven Motion Generation: Overview, Challenges and Directions
Ali Rida Sahili, Najett Neji, Hedi Tabia

TL;DR
This paper provides a comprehensive survey of text-driven human motion generation, discussing current methods, datasets, challenges, and future directions in the field.
Contribution
It offers a structured overview of modern approaches, categorizing them by architecture and motion representation, and highlights key challenges and promising research directions.
Findings
Survey of VAE, diffusion, and hybrid models for text-to-motion
Analysis of discrete vs. continuous motion generation strategies
Overview of datasets, evaluation methods, and benchmarks
Abstract
Text-driven motion generation offers a powerful and intuitive way to create human movements directly from natural language. By removing the need for predefined motion inputs, it provides a flexible and accessible approach to controlling animated characters. This makes it especially useful in areas like virtual reality, gaming, human-computer interaction, and robotics. In this review, we first revisit the traditional perspective on motion synthesis, where models focused on predicting future poses from observed initial sequences, often conditioned on action labels. We then provide a comprehensive and structured survey of modern text-to-motion generation approaches, categorizing them from two complementary perspectives: (i) architectural, dividing methods into VAE-based, diffusion-based, and hybrid models; and (ii) motion representation, distinguishing between discrete and continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications
MethodsSoftmax · Attention Is All You Need
