Learning Montezuma's Revenge from a Single Demonstration

Tim Salimans; Richard Chen

arXiv:1812.03381·cs.LG·December 11, 2018·84 cites

Learning Montezuma's Revenge from a Single Demonstration

Tim Salimans, Richard Chen

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning approach that learns to play Montezuma's Revenge from a single demonstration by starting episodes from demonstration states, significantly reducing exploration difficulty and achieving state-of-the-art scores.

Contribution

The paper presents a novel method that trains agents from a single demonstration by resetting to demonstration states, improving learning efficiency in sparse reward environments.

Findings

01

Achieved a high score of 74,500 on Montezuma's Revenge, surpassing previous results.

02

Reduced the scaling of RL run-time from exponential to quadratic in the number of states between rewards.

03

Demonstrated effectiveness of starting episodes from demonstration states in sparse reward tasks.

Abstract

We propose a new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge. Instead of imitating human demonstrations, as proposed in other recent works, our approach is to maximize rewards directly. Our agent is trained using off-the-shelf reinforcement learning, but starts every episode by resetting to a state from a demonstration. By starting from such demonstration states, the agent requires much less exploration to learn a game compared to when it starts from the beginning of the game at every episode. We analyze reinforcement learning for tasks with sparse rewards in a simple toy environment, where we show that the run-time of standard RL methods scales exponentially in the number of states between rewards. Our method reduces this to quadratic scaling, opening up many tasks that were previously infeasible. We then apply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Generative Adversarial Networks and Image Synthesis