Model-Based Reinforcement Learning for Atari

Lukasz Kaiser; Mohammad Babaeizadeh; Piotr Milos; Blazej Osinski; Roy; H Campbell; Konrad Czechowski; Dumitru Erhan; Chelsea Finn; Piotr Kozakowski,; Sergey Levine; Afroz Mohiuddin; Ryan Sepassi; George Tucker; Henryk; Michalewski

arXiv:1903.00374·cs.LG·April 4, 2024·421 cites

Model-Based Reinforcement Learning for Atari

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy, H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski,, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk, Michalewski

PDF

Open Access 2 Repos

TL;DR

This paper introduces SimPLe, a model-based reinforcement learning algorithm using video prediction models that learns effective Atari game policies with significantly less interaction than traditional model-free methods.

Contribution

The paper presents a novel model-based RL algorithm, SimPLe, and demonstrates its superior performance over state-of-the-art model-free algorithms in low data regimes for Atari games.

Findings

01

SimPLe outperforms model-free algorithms in most Atari games.

02

In some games, SimPLe achieves over ten times better data efficiency.

03

A new video prediction architecture yields the best results in the experiments.

Abstract

Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and predict which actions will lead to desirable outcomes. In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting. Our experiments evaluate SimPLe on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games