Deep Exploration via Bootstrapped DQN

Ian Osband; Charles Blundell; Alexander Pritzel; Benjamin Van Roy

arXiv:1602.04621·cs.LG·July 5, 2016·463 cites

Deep Exploration via Bootstrapped DQN

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

PDF

Open Access 5 Repos

TL;DR

Bootstrapped DQN introduces a randomized value function approach for deep, efficient exploration in reinforcement learning, significantly improving learning speed and performance in complex environments like Atari games.

Contribution

It presents a simple, effective algorithm for deep exploration using randomized value functions, outperforming traditional methods like epsilon-greedy in complex tasks.

Findings

01

Faster learning in stochastic MDPs.

02

Improved performance across Atari games.

03

Exponential speedup in exploration efficiency.

Abstract

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research

MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network