Deep Exploration via Bootstrapped DQN
Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

TL;DR
Bootstrapped DQN introduces a randomized value function approach for deep, efficient exploration in reinforcement learning, significantly improving learning speed and performance in complex environments like Atari games.
Contribution
It presents a simple, effective algorithm for deep exploration using randomized value functions, outperforming traditional methods like epsilon-greedy in complex tasks.
Findings
Faster learning in stochastic MDPs.
Improved performance across Atari games.
Exponential speedup in exploration efficiency.
Abstract
Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network
