MuDreamer: Learning Predictive World Models without Reconstruction
Maxime Burchi, Radu Timofte

TL;DR
MuDreamer introduces a reinforcement learning approach that learns predictive world models without pixel reconstruction, enhancing robustness to visual distractions and achieving efficient training compared to prior methods.
Contribution
It proposes a reconstruction-free world model learning method that improves robustness and training efficiency in reinforcement learning agents.
Findings
MuDreamer outperforms DreamerV3 under visual distractions.
It achieves comparable Atari100k performance with faster training.
Reconstruction-free modeling enhances robustness to irrelevant visual inputs.
Abstract
The DreamerV3 agent recently demonstrated state-of-the-art performance in diverse domains, learning powerful world models in latent space using a pixel reconstruction loss. However, while the reconstruction loss is essential to Dreamer's performance, it also necessitates modeling unnecessary information. Consequently, Dreamer sometimes fails to perceive crucial elements which are necessary for task-solving when visual distractions are present in the observation, significantly limiting its potential. In this paper, we present MuDreamer, a robust reinforcement learning agent that builds upon the DreamerV3 algorithm by learning a predictive world model without the need for reconstructing input signals. Rather than relying on pixel reconstruction, hidden representations are instead learned by predicting the environment value function and previously selected actions. Similar to predictive…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
1. The paper is clear, presents the scope, problem it wants to tackle and related work well. 2. The presentation of the method is clear and the modifications are easy to follow (although quite directly inspired from the Dreamer papers) 3. Results are complete and well presented, with a good coverage of Control Suite results experiments as well as Atari100k. Baselines choices are good. 4. Ablation study of 5.2 is clean and well executed once again.
1. The similarities between MuDreamer and DreamerV3 are potentially too strong to make this work significant enough in this state. The paper looks like yet another version of Dreamer, with the exact same math and text dangerously close to a copy, with only a few extra ablations and modifications. 2. Results aren’t as clear-cut as I’d like. There have been a lot of MBRL papers in recent years which explored many combinations of losses, models, actors, but it is quite hard to find which components
The motivation is sound, as this paper combines the strengths of dreamerV2 and MuZero to tackle tasks from image inputs with both continuous and discrete action spaces, without the need for input signal reconstruction.
1. Value predictors in this research are inspired by MuZero, and the inclusion of action prediction is a common practice in various model-based approaches. As a result, the novelty may be relatively constrained. 2. The comparison is unfair as it only considers Dreamerv3. It would be more equitable to include more model-based methods for comparison, such as Dreamerpro[1] and denoised MDPs[2]. Dreamerpro, in particular, is a highly relevant method within the domain of Reconstruction-free model-ba
1. This paper is well-organized and easy to follow. 2. The proposed model is extensively evaluated on widely used visual control benchmarks. It also provides comprehensive ablation studies to explore the effectiveness of each model component. 3. The model achieves performance comparable to DreamerV3, and as claimed by the authors, it is more efficient in the training time.
1. As stated by the authors, 'MuDreamer solves tasks without the need for a reconstruction loss.' However, this seems to be in contrast with the loss function described in Eq. (3), which still involves optimizing the image decoder with a reconstruction loss. If my understanding is accurate, the distinction from DreamerV3 lies in the fact that the gradient from the reconstruction loss doesn't back-propagate to the dynamics module. In light of this, I recommend that the authors consider revising t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization
