Deep Exploration via Randomized Value Functions

Ian Osband; Benjamin Van Roy; Daniel Russo; Zheng Wen

arXiv:1703.07608·stat.ML·September 25, 2019·68 cites

Deep Exploration via Randomized Value Functions

Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

PDF

Open Access

TL;DR

This paper introduces randomized value functions to improve exploration in deep reinforcement learning, combining efficiency with practical value function learning methods, supported by theoretical and empirical results.

Contribution

It proposes new RL algorithms using randomized value functions and provides theoretical regret bounds alongside empirical validation.

Findings

01

Algorithms outperform baseline methods in computational studies.

02

Regret bounds demonstrate statistical efficiency with tabular representations.

03

Randomized value functions facilitate effective deep exploration.

Abstract

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. We also prove a regret bound that establishes statistical efficiency with a tabular representation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning