ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
Wanjiang Weng, Xiaofeng Tan, Hongsong Wang, Pan Zhou

TL;DR
ReAlign introduces a novel reward-guided diffusion approach for bilingual text-to-motion generation, addressing dataset scarcity and alignment issues to produce semantically consistent and high-quality 3D motions from bilingual texts.
Contribution
The paper presents a new bilingual human motion dataset, a unified bilingual diffusion model, and a reward-guided sampling method to improve alignment and motion quality.
Findings
Enhanced text-motion alignment accuracy
Improved motion realism and diversity
Outperforms existing state-of-the-art methods
Abstract
Bilingual text-to-motion generation, which synthesizes 3D human motions from bilingual text inputs, holds immense potential for cross-linguistic applications in gaming, film, and robotics. However, this task faces critical challenges: the absence of bilingual motion-language datasets and the misalignment between text and motion distributions in diffusion models, leading to semantically inconsistent or low-quality motions. To address these challenges, we propose BiHumanML3D, a novel bilingual human motion dataset, which establishes a crucial benchmark for bilingual text-to-motion generation models. Furthermore, we propose a Bilingual Motion Diffusion model (BiMD), which leverages cross-lingual aligned representations to capture semantics, thereby achieving a unified bilingual model. Building upon this, we propose Reward-guided sampling Alignment (ReAlign) method, comprising a step-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsDiffusion
