UCB Exploration via Q-Ensembles
Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

TL;DR
This paper introduces a UCB-based exploration strategy using Q-ensembles for deep reinforcement learning, demonstrating significant performance improvements on Atari benchmarks.
Contribution
It adapts UCB exploration from bandit algorithms to deep Q-learning using ensembles, providing a novel approach for more effective exploration.
Findings
Significant gains on Atari benchmark tasks
Effective adaptation of UCB exploration to deep Q-learning
Q-ensembles improve exploration efficiency
Abstract
We show how an ensemble of -functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the -learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
