Noisy Networks for Exploration

Meire Fortunato; Mohammad Gheshlaghi Azar; Bilal Piot; Jacob Menick,; Ian Osband; Alex Graves; Vlad Mnih; Remi Munos; Demis Hassabis; Olivier; Pietquin; Charles Blundell; Shane Legg

arXiv:1706.10295·cs.LG·November 1, 2019·388 cites

Noisy Networks for Exploration

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick,, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier, Pietquin, Charles Blundell, Shane Legg

PDF

Open Access 5 Repos

TL;DR

NoisyNet introduces learnable parametric noise into deep reinforcement learning agents, enhancing exploration efficiency and significantly improving performance across various Atari games compared to traditional exploration methods.

Contribution

The paper presents NoisyNet, a simple yet effective method of incorporating learnable noise into network weights to improve exploration in deep reinforcement learning.

Findings

01

NoisyNet outperforms traditional exploration heuristics in Atari games.

02

Replacing entropy and epsilon-greedy exploration with NoisyNet yields higher scores.

03

In some cases, NoisyNet achieves super-human performance.

Abstract

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $ϵ$ -greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research

MethodsDouble Q-learning · Softmax · Entropy Regularization · A3C · Dueling Network · Q-Learning · NoisyNet-A3C · NoisyNet-Dueling · NoisyNet-DQN · Noisy Linear Layer