RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

Yinzhou Tang; Yu Shang; Yinuo Chen; Bingwen Wei; Xin Zhang; Shu'ang Yu; Liangzhi Shi; Chao Yu; Chen Gao; Wei Wu; Yong Li

arXiv:2512.03556·cs.RO·December 4, 2025

RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

Yinzhou Tang, Yu Shang, Yinuo Chen, Bingwen Wei, Xin Zhang, Shu'ang Yu, Liangzhi Shi, Chao Yu, Chen Gao, Wei Wu, Yong Li

PDF

Open Access

TL;DR

RoboScape-R introduces a world model-based reward system that enhances the generalization of embodied policies in robotics by serving as a versatile environment proxy, leading to significant out-of-domain performance improvements.

Contribution

The paper presents RoboScape-R, a novel framework that uses a world model to generate endogenous rewards, enabling more general and effective reinforcement learning for robotics.

Findings

01

Achieves 37.5% performance improvement over baselines in out-of-domain tests.

02

Effectively addresses traditional RL limitations with a unified training environment.

03

Demonstrates the world model's potential as an online training strategy.

Abstract

Achieving generalizable embodied policies remains a key challenge. Traditional policy learning paradigms, including both Imitation Learning (IL) and Reinforcement Learning (RL), struggle to cultivate generalizability across diverse scenarios. While IL policies often overfit to specific expert trajectories, RL suffers from the inherent lack of a unified and general reward signal necessary for effective multi-scene generalization. We posit that the world model is uniquely capable of serving as a universal environment proxy to address this limitation. However, current world models primarily focus on their ability to predict observations and still rely on task-specific, handcrafted reward functions, thereby failing to provide a truly general training environment. Toward this problem, we propose RoboScape-R, a framework leveraging the world model to serve as a versatile, general-purpose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Social Robot Interaction and HRI · Robot Manipulation and Learning