MotionRFT: Unified Reinforcement Fine-Tuning for Text-to-Motion Generation

Xiaofeng Tan; Wanjiang Weng; Hongsong Wang; Fang Zhao; Xin Geng; and Liang Wang

arXiv:2603.27185·cs.CV·March 31, 2026

MotionRFT: Unified Reinforcement Fine-Tuning for Text-to-Motion Generation

Xiaofeng Tan, Wanjiang Weng, Hongsong Wang, Fang Zhao, Xin Geng, and Liang Wang

PDF

1 Repo

TL;DR

MotionRFT introduces a reinforcement fine-tuning framework with a unified semantic reward and efficient step-wise optimization, significantly improving text-to-motion generation quality and efficiency.

Contribution

It proposes a novel multi-dimensional reward model and a fine-grained, memory-efficient fine-tuning method for better alignment in text-to-motion models.

Findings

01

Achieved FID of 0.132 with 22.10 GB memory on MLD model.

02

Saved up to 15.22 GB memory compared to DRaFT.

03

Improved FID and R-Precision metrics on multiple motion datasets.

Abstract

Text-to-motion generation has advanced with diffusion- and flow-based generative models, yet supervised pretraining remains insufficient to align models with high-level objectives such as semantic consistency, realism, and human preference. Existing post-training methods have key limitations: they (1) target a specific motion representation, such as joints, (2) optimize a particular aspect, such as text-motion alignment, and may compromise other factors; and (3) incur substantial computational overhead, data dependence, and coarse-grained optimization. We present a reinforcement fine-tuning framework that comprises a heterogeneous-representation, multi-dimensional reward model, MotionReward, and an efficient, fine-grained fine-tuning method, EasyTune. To obtain a unified semantics representation, MotionReward maps heterogeneous motions into a shared semantic space anchored by text,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.