When to use parametric models in reinforcement learning?
Hado van Hasselt, Matteo Hessel, John Aslanides

TL;DR
This paper investigates the conditions under which parametric models are most effective in reinforcement learning, comparing them with experience replay methods and validating findings on Atari games.
Contribution
It provides a theoretical and empirical analysis of when replay-based methods outperform model-based approaches in reinforcement learning.
Findings
Replay-based algorithms can be more data-efficient than model-based ones under certain conditions.
The hypothesis was validated on Atari 2600 games, achieving state-of-the-art data efficiency.
Replay methods outperform parametric models in specific scenarios, especially with fictional transition generation.
Abstract
We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free. We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Evolutionary Algorithms and Applications
