Reconstruction-Anchored Diffusion Model for Text-to-Motion Generation

Yifei Liu; Changxing Ding; Ling Guo; Huaiguang Jiang; Qiong Cao

arXiv:2601.14788·cs.CV·January 22, 2026

Reconstruction-Anchored Diffusion Model for Text-to-Motion Generation

Yifei Liu, Changxing Ding, Ling Guo, Huaiguang Jiang, Qiong Cao

PDF

Open Access

TL;DR

This paper introduces RAM, a novel diffusion model for text-to-motion generation that uses motion latent space supervision and a reconstructive error guidance mechanism to improve accuracy and reduce error propagation.

Contribution

RAM combines motion latent space supervision with a reconstructive error guidance technique, advancing the state-of-the-art in text-to-motion generation.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Significantly reduces error propagation during denoising.

03

Improves motion generation accuracy and diversity.

Abstract

Diffusion models have seen widespread adoption for text-driven human motion generation and related tasks due to their impressive generative capabilities and flexibility. However, current motion diffusion models face two major limitations: a representational gap caused by pre-trained text encoders that lack motion-specific information, and error propagation during the iterative denoising process. This paper introduces Reconstruction-Anchored Diffusion Model (RAM) to address these challenges. First, RAM leverages a motion latent space as intermediate supervision for text-to-motion generation. To this end, RAM co-trains a motion reconstruction branch with two key objective functions: self-regularization to enhance the discrimination of the motion space and motion-centric latent alignment to enable accurate mapping from text to the motion latent space. Second, we propose Reconstructive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Social Robot Interaction and HRI