R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Naoki Morihira; Amal Nahar; Kartik Bharadwaj; Yasuhiro Kato; Akinobu Hayashi; Tatsuya Harada

arXiv:2603.18202·cs.LG·March 23, 2026

R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Naoki Morihira, Amal Nahar, Kartik Bharadwaj, Yasuhiro Kato, Akinobu Hayashi, Tatsuya Harada

PDF

Open Access 3 Reviews

TL;DR

R2-Dreamer introduces a decoder-free, regularizer-based approach for image-based model-based reinforcement learning that improves training speed and performance without relying on external data augmentation.

Contribution

It presents a novel internal regularizer inspired by Barlow Twins, enabling effective, decoder-free representations in MBRL without external augmentation.

Findings

01

Competitive performance on DeepMind Control Suite and Meta-World

02

Training speed is 1.59x faster than DreamerV3

03

Significant gains on DMC-Subtle with tiny objects

Abstract

A central challenge in image-based Model-Based Reinforcement Learning (MBRL) is to learn representations that distill essential information from irrelevant visual details. While promising, reconstruction-based methods often waste capacity on large task-irrelevant regions. Decoder-free methods instead learn robust representations by leveraging Data Augmentation (DA), but reliance on such external regularizers limits versatility. We propose R2-Dreamer, a decoder-free MBRL framework with a self-supervised objective that serves as an internal regularizer, preventing representation collapse without resorting to DA. The core of our method is a redundancy-reduction objective inspired by Barlow Twins, which can be easily integrated into existing frameworks. On DeepMind Control Suite and Meta-World, R2-Dreamer is competitive with strong baselines such as DreamerV3 and TD-MPC2 while training…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 2

Strengths

1. Principled approach to removing DA: The paper successfully identifies a major pain point in current decoder-free methods—the reliance on heuristic DA. Proposing an information-theoretic internal regularizer (redundancy reduction) as a replacement is a sound and theoretically motivated direction, nicely grounded in the Sequential Information Bottleneck framework in Appendix A2. 2. Strong empirical validation of the core hypothesis: The ablation study (Figure 6) provides compelling evidence. It

Weaknesses

1. Batch size concerns for Barlow Twins: Redundancy reduction objectives often require large batch sizes for stable covariance estimation. The paper uses standard Dreamer batching ($B=16, T=64 \implies N=1024$)8. While this appears sufficient for DMC, it raises concerns about stability in higher-dimensional or more diverse visual environments where 1024 samples might not sufficiently estimate the cross-correlation matrix. 2. Ambiguity in Encoder Training: The pseudocode (Algorithm 1) indicates t

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper is clearly written, allowing readers to follow the main arguments. 2. It provides a comprehensive experiments and ablations to demonstrate the effectiveness of their proposed new representation learning paradigm. 3. Releasing codebase is good, which can facilitate future research.

Weaknesses

1. The authors didn't compare TD-MPC2 in their experiments, which is a strong state-of-the-art baseline for decoder-free methods. 2. Though authors evaluated different methods on many tasks on the DMC benchmark, I think evaluating only on these locomotion tasks is kind of not comprehesive and it would be better to evaluate on other types tasks like Meta-World. 3. One of claims in this paper is that their method doesn't need hand-engineered data augmentation like other decoder-free MBRL methods.

Reviewer 03Rating 8Confidence 4

Strengths

1. The paper clearly identifies a key limitation of DreamerV3: its latent representations can be overly influenced by reconstructing input images, leading to a focus on irrelevant pixels rather than task-relevant features. By replacing the reconstruction loss with a self-supervised objective defined in the latent space, the proposed method encourages more compact and task-relevant representations. 2. The paper provides solid theoretical analysis to explain the impact of the new learning objectiv

Weaknesses

1. The experimental environments are not sufficiently diverse. The paper evaluates only on DMC-Subtle, whereas DreamerV3 has also been tested on other benchmarks such as Atari and DMLab. Including experiments on these additional environments would make the evaluation more comprehensive and the conclusions more convincing.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications