Exploration by Random Network Distillation

Yuri Burda; Harrison Edwards; Amos Storkey; Oleg Klimov

arXiv:1810.12894·cs.LG·October 31, 2018·259 cites

Exploration by Random Network Distillation

Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov

PDF

Open Access 5 Repos 1 Models 1 Video

TL;DR

This paper presents Random Network Distillation (RND), a simple yet effective exploration bonus for deep reinforcement learning that significantly improves performance on challenging Atari games like Montezuma's Revenge, achieving human-level results without demonstrations.

Contribution

Introduces RND, a novel exploration bonus based on prediction error of a fixed random network, enhancing exploration in deep RL with minimal computational overhead.

Findings

01

Achieves state-of-the-art results on Montezuma's Revenge

02

First method to outperform average human performance without demonstrations

03

Enables occasional level completion in a notoriously difficult game

Abstract

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Adilbai/Pyramids-RL-agent-ppo
model· 10 dl· ♡ 2
10 dl♡ 2

Videos

Building a Curious AI With Random Network Distillation· youtube

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification