TL;DR
ResDreamer introduces a hierarchical self-supervised world model that progressively abstracts world dynamics, achieving state-of-the-art efficiency in reinforcement learning in complex environments.
Contribution
It proposes a novel residual-based hierarchical world model that enhances visual reasoning and scalability without domain-specific knowledge.
Findings
Achieves state-of-the-art sample efficiency in RL tasks.
Demonstrates effective scaling with linear cross-layer communication.
Enables richer latent representations for complex environments.
Abstract
3D open-world environments with adversarial opponents remain a core challenge for reinforcement learning due to their vast state spaces. Effective reasoning representations are essential in such settings. While existing self-supervised visual foresight reasoning approaches often suffer from multi-step error accumulation, many recent studies resort to injecting domain-specific knowledge for more stable guidance. Our key insight is that the photorealistic fidelity of visual reasoning representations is secondary; what truly matters is providing informative, task-relevant signals. To this end, we propose ResDreamer, a hierarchical world model in which each higher-level layer is trained to reconstruct the residuals of the layer below. This design enables progressive abstraction of increasingly sophisticated world dynamics and fosters the emergence of richer latent representations. Drawing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
