ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

Wanjiang Weng; Xiaofeng Tan; Hongsong Wang; Pan Zhou

arXiv:2505.04974·cs.CV·August 4, 2025

ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

Wanjiang Weng, Xiaofeng Tan, Hongsong Wang, Pan Zhou

PDF

Open Access

TL;DR

ReAlign introduces a novel reward-guided diffusion approach for bilingual text-to-motion generation, addressing dataset scarcity and alignment issues to produce semantically consistent and high-quality 3D motions from bilingual texts.

Contribution

The paper presents a new bilingual human motion dataset, a unified bilingual diffusion model, and a reward-guided sampling method to improve alignment and motion quality.

Findings

01

Enhanced text-motion alignment accuracy

02

Improved motion realism and diversity

03

Outperforms existing state-of-the-art methods

Abstract

Bilingual text-to-motion generation, which synthesizes 3D human motions from bilingual text inputs, holds immense potential for cross-linguistic applications in gaming, film, and robotics. However, this task faces critical challenges: the absence of bilingual motion-language datasets and the misalignment between text and motion distributions in diffusion models, leading to semantically inconsistent or low-quality motions. To address these challenges, we propose BiHumanML3D, a novel bilingual human motion dataset, which establishes a crucial benchmark for bilingual text-to-motion generation models. Furthermore, we propose a Bilingual Motion Diffusion model (BiMD), which leverages cross-lingual aligned representations to capture semantics, thereby achieving a unified bilingual model. Building upon this, we propose Reward-guided sampling Alignment (ReAlign) method, comprising a step-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsDiffusion