Randomized Prior Functions for Deep Reinforcement Learning
Ian Osband, John Aslanides, Albin Cassirer

TL;DR
This paper introduces a method for better uncertainty estimation in deep reinforcement learning by adding randomized prior functions to ensemble members, improving scalability and effectiveness over previous approaches.
Contribution
It proposes a simple, scalable method using randomized prior functions for uncertainty estimation in deep RL, addressing limitations of existing techniques.
Findings
The method is theoretically efficient with linear models.
Demonstrates effectiveness with nonlinear models.
Scales better to large problems than previous methods.
Abstract
Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms
