Text-driven Motion Generation: Overview, Challenges and Directions

Ali Rida Sahili; Najett Neji; Hedi Tabia

arXiv:2505.09379·cs.CV·May 15, 2025

Text-driven Motion Generation: Overview, Challenges and Directions

Ali Rida Sahili, Najett Neji, Hedi Tabia

PDF

Open Access

TL;DR

This paper provides a comprehensive survey of text-driven human motion generation, discussing current methods, datasets, challenges, and future directions in the field.

Contribution

It offers a structured overview of modern approaches, categorizing them by architecture and motion representation, and highlights key challenges and promising research directions.

Findings

01

Survey of VAE, diffusion, and hybrid models for text-to-motion

02

Analysis of discrete vs. continuous motion generation strategies

03

Overview of datasets, evaluation methods, and benchmarks

Abstract

Text-driven motion generation offers a powerful and intuitive way to create human movements directly from natural language. By removing the need for predefined motion inputs, it provides a flexible and accessible approach to controlling animated characters. This makes it especially useful in areas like virtual reality, gaming, human-computer interaction, and robotics. In this review, we first revisit the traditional perspective on motion synthesis, where models focused on predicting future poses from observed initial sequences, often conditioned on action labels. We then provide a comprehensive and structured survey of modern text-to-motion generation approaches, categorizing them from two complementary perspectives: (i) architectural, dividing methods into VAE-based, diffusion-based, and hybrid models; and (ii) motion representation, distinguishing between discrete and continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need