Deep Exploration via Randomized Value Functions
Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

TL;DR
This paper introduces randomized value functions to improve exploration in deep reinforcement learning, combining efficiency with practical value function learning methods, supported by theoretical and empirical results.
Contribution
It proposes new RL algorithms using randomized value functions and provides theoretical regret bounds alongside empirical validation.
Findings
Algorithms outperform baseline methods in computational studies.
Regret bounds demonstrate statistical efficiency with tabular representations.
Randomized value functions facilitate effective deep exploration.
Abstract
We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. We also prove a regret bound that establishes statistical efficiency with a tabular representation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
