RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control
Junpeng Yue, Zepeng Wang, Yuxuan Wang, Weishuai Zeng, Jiangxing Wang, Xinrun Xu, Yu Zhang, Sipeng Zheng, Ziluo Ding, Zongqing Lu

TL;DR
This paper introduces RLPF, a reinforcement learning framework that improves text-driven humanoid motion generation by ensuring physical feasibility and semantic alignment, bridging the gap between simulation and real-world deployment.
Contribution
The paper presents a novel physics-aware reinforcement learning approach that enhances the realism and semantic accuracy of text-conditioned humanoid motions for robot deployment.
Findings
RLPF significantly improves physical feasibility of generated motions.
The method maintains high semantic fidelity to text instructions.
Successful deployment on real humanoid robots demonstrated.
Abstract
This paper focuses on a critical challenge in robotics: translating text-driven human motions into executable actions for humanoid robots, enabling efficient and cost-effective learning of new behaviors. While existing text-to-motion generation methods achieve semantic alignment between language and motion, they often produce kinematically or physically infeasible motions unsuitable for real-world deployment. To bridge this sim-to-real gap, we propose Reinforcement Learning from Physical Feedback (RLPF), a novel framework that integrates physics-aware motion evaluation with text-conditioned motion generation. RLPF employs a motion tracking policy to assess feasibility in a physics simulator, generating rewards for fine-tuning the motion generator. Furthermore, RLPF introduces an alignment verification module to preserve semantic fidelity to text instructions. This joint optimization…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- Addresses a real gap between text-driven motion generation and humanoid-feasible control. This is an open problem, and the integration of physics feedback into generative motion models is a meaningful exploration. - Technical pipeline is clean and modular — motion tokenizer + LLM + motion-tracking RL + contrastive alignment. - Sim2real real robot experiments on Unitree G1 support some of the claims.
- The method largely reuses known components—VQ-motion tokenizer, supervised T2M pretraining, teacher–student tracking policies, GRPO fine-tuning—and combines them. The RL formulation mirrors RLHF but uses physics reward instead of human preference. Conceptually sensible but not that novel. - Results do not convincingly justify the added complexity. While the paper reports improvements in feasibility, the absolute success rates remain moderate (e.g., IsaacGym 0.83 for AMASS, 0.75 for MotionX-W)
- Clear motivation and formulation: The paper addresses a meaningful gap between text-driven motion generation and physically executable humanoid control, a problem of growing importance for humanoid generalization. - Novel learning framework: The use of reinforcement learning from physics feedback (as an analogy to RLHF) is conceptually strong and well-motivated.
- Lack of qualitative baselines: The paper does not include visual qualitative comparisons in simulation or real-world deployment against baselines. Visual demonstrations would better support the claimed improvements in physical realism. - Limited real-world evaluation: Only four real-world results are presented, which is quite limited for a robotics paper. The sample size is too small to validate generalization or reliability outside simulation. - Dependency on weak tracking policy: The impro
* This work addresses an important and timely problem. Current text-to-motion models can generate high-quality kinematic motions consistent with text prompts. However, these motions are often not directly transferable to physical hardware due to issues such as limited physical feasibility and challenges in retargeting. Developing algorithms that produce robust, high-quality motions that can be readily transferred to physical robots is therefore a valuable research direction. * Writing quality fo
* The propsed method seems over complicated to me. The idea of using RL to fine-tune a text-to-motion model based on some physical feasibility/transferability reward makes sense. But some of the later choices for the method are not well justified. For example: 1) Why is the base text-to-motion (T2M) model an LLM when most state-of-the-art (SOTA) T2M models are diffusion-based? 2) Why would a reward model derived from a tracking policy ensure that kinematic motions are better suited for physical
- The paper is well-structured and clearly written, making it easy to follow and understand. - It addresses an important and timely problem at the intersection of large motion models and humanoid robot control. - The proposed RLPF framework is novel, introducing the integration of reinforcement learning–based physical feedback into text-to-motion generation, effectively bridging the gap between semantic motion synthesis and physically executable control.
- The overall soundness of the proposed method is not entirely convincing. The definition of physical feasibility heavily depends on whether an RL-trained tracking controller can follow the generated and retargeted motion. This design raises several conceptual concerns. - It is unclear whether the teacher–student framework is necessary. If the teacher policy can successfully track a reference trajectory but the student cannot due to missing privileged information, should the motion still be con
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Vision and Imaging
