ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
Wanjiang Weng, Xiaofeng Tan, Junbo Wang, Guo-Sen Xie, Pan Zhou, Hongsong Wang

TL;DR
ReAlign introduces a reward-guided diffusion approach with a step-aware model to improve the semantic alignment and realism of text-to-motion generation, outperforming existing methods.
Contribution
The paper proposes a novel reward-guided sampling strategy with a step-aware model to enhance text-motion alignment in diffusion-based generation.
Findings
Significantly improves text-motion alignment accuracy.
Enhances motion realism and diversity.
Outperforms state-of-the-art methods in experiments.
Abstract
Text-to-motion generation, which synthesizes 3D human motions from text inputs, holds immense potential for applications in gaming, film, and robotics. Recently, diffusion-based methods have been shown to generate more diversity and realistic motion. However, there exists a misalignment between text and motion distributions in diffusion models, which leads to semantically inconsistent or low-quality motions. To address this limitation, we propose Reward-guided sampling Alignment (ReAlign), comprising a step-aware reward model to assess alignment quality during the denoising sampling and a reward-guided strategy that directs the diffusion process toward an optimally aligned distribution. This reward model integrates step-aware tokens and combines a text-aligned module for semantic consistency and a motion-aligned module for realism, refining noisy motions at each timestep to balance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
