TL;DR
RISE introduces a self-improving robotic reinforcement learning framework using a compositional world model that predicts future states and evaluates imagined outcomes, enabling efficient policy improvement without physical environment resets.
Contribution
The paper presents a scalable, imagination-based RL framework with a compositional world model that enhances robot policy learning in contact-rich tasks.
Findings
Achieved over +35% performance in dynamic brick sorting.
Improved backpack packing success rate by +45%.
Enhanced box closing performance by +35%.
Abstract
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
