ResWM: Residual-Action World Model for Visual RL

Jseen Zhang; Gabriel Adineera; Jinzhou Tan; and Jinoh Kim

arXiv:2603.11110·cs.RO·March 13, 2026

ResWM: Residual-Action World Model for Visual RL

Jseen Zhang, Gabriel Adineera, Jinzhou Tan, and Jinoh Kim

PDF

Open Access

TL;DR

ResWM introduces a residual-action framework for visual RL that improves stability, sample efficiency, and control smoothness by modeling incremental adjustments instead of absolute actions, benefiting real-world robotic applications.

Contribution

The paper proposes the Residual-Action World Model (ResWM), a novel approach that reformulates control as residual actions, enhancing stability and efficiency in visual RL with minimal modifications to existing models.

Findings

01

ResWM outperforms Dreamer and TD-MPC on DeepMind Control Suite.

02

ResWM achieves more stable and energy-efficient action trajectories.

03

ResWM improves sample efficiency and control smoothness.

Abstract

Learning predictive world models from raw visual observations is a central challenge in reinforcement learning (RL), especially for robotics and continuous control. Conventional model-based RL frameworks directly condition future predictions on absolute actions, which makes optimization unstable: the optimal action distributions are task-dependent, unknown a priori, and often lead to oscillatory or inefficient control. To address this, we introduce the Residual-Action World Model (ResWM), a new framework that reformulates the control variable from absolute actions to residual actions -- incremental adjustments relative to the previous step. This design aligns with the inherent smoothness of real-world control, reduces the effective search space, and stabilizes long-horizon planning. To further strengthen the representation, we propose an Observation Difference Encoder that explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning