Never Give Up: Learning Directed Exploration Strategies

Adri\`a Puigdom\`enech Badia; Pablo Sprechmann; Alex Vitvitskyi,; Daniel Guo; Bilal Piot; Steven Kapturowski; Olivier Tieleman; Mart\'in; Arjovsky; Alexander Pritzel; Andew Bolt; Charles Blundell

arXiv:2002.06038·cs.LG·February 17, 2020·80 cites

Never Give Up: Learning Directed Exploration Strategies

Adri\`a Puigdom\`enech Badia, Pablo Sprechmann, Alex Vitvitskyi,, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Mart\'in, Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell

PDF

Open Access 5 Repos

TL;DR

This paper introduces a reinforcement learning approach that learns directed exploration strategies using episodic memory and UVFA, significantly improving performance on hard exploration games like Atari and Pitfall! without demonstrations.

Contribution

The paper presents a novel method combining episodic memory-based intrinsic rewards and UVFA to learn multiple exploration policies simultaneously, enhancing exploration efficiency and transferability.

Findings

01

Doubles performance on Atari-57 hard exploration games

02

Achieves non-zero rewards in Pitfall! without demonstrations

03

Maintains high scores across various Atari games

Abstract

We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control. We employ the framework of Universal Value Function Approximators (UVFA) to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration and exploitation. By using the same neural network for different degrees of exploration/exploitation, transfer is demonstrated from predominantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Human Pose and Action Recognition