UCB Exploration via Q-Ensembles

Richard Y. Chen; Szymon Sidor; Pieter Abbeel; John Schulman

arXiv:1706.01502·cs.LG·November 9, 2017·77 cites

UCB Exploration via Q-Ensembles

Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

PDF

Open Access

TL;DR

This paper introduces a UCB-based exploration strategy using Q-ensembles for deep reinforcement learning, demonstrating significant performance improvements on Atari benchmarks.

Contribution

It adapts UCB exploration from bandit algorithms to deep Q-learning using ensembles, providing a novel approach for more effective exploration.

Findings

01

Significant gains on Atari benchmark tasks

02

Effective adaptation of UCB exploration to deep Q-learning

03

Q-ensembles improve exploration efficiency

Abstract

We show how an ensemble of $Q^{*}$ -functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$ -learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems