Self-supervised Hierarchical Visual Reasoning with World Model

Yuanfei Xu; Lin Liu; Wengang Zhou; Mingxiao Feng; Houqiang Li

arXiv:2605.17537·cs.AI·May 19, 2026

Self-supervised Hierarchical Visual Reasoning with World Model

Yuanfei Xu, Lin Liu, Wengang Zhou, Mingxiao Feng, Houqiang Li

PDF

1 Repo

TL;DR

ResDreamer introduces a hierarchical self-supervised world model that progressively abstracts world dynamics, achieving state-of-the-art efficiency in reinforcement learning in complex environments.

Contribution

It proposes a novel residual-based hierarchical world model that enhances visual reasoning and scalability without domain-specific knowledge.

Findings

01

Achieves state-of-the-art sample efficiency in RL tasks.

02

Demonstrates effective scaling with linear cross-layer communication.

03

Enables richer latent representations for complex environments.

Abstract

3D open-world environments with adversarial opponents remain a core challenge for reinforcement learning due to their vast state spaces. Effective reasoning representations are essential in such settings. While existing self-supervised visual foresight reasoning approaches often suffer from multi-step error accumulation, many recent studies resort to injecting domain-specific knowledge for more stable guidance. Our key insight is that the photorealistic fidelity of visual reasoning representations is secondary; what truly matters is providing informative, task-relevant signals. To this end, we propose ResDreamer, a hierarchical world model in which each higher-level layer is trained to reconstruct the residuals of the layer below. This design enables progressive abstraction of increasingly sophisticated world dynamics and fosters the emergence of richer latent representations. Drawing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XuYuanFei01/ResDreamer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.