Randomized Prior Functions for Deep Reinforcement Learning

Ian Osband; John Aslanides; Albin Cassirer

arXiv:1806.03335·stat.ML·November 16, 2018·105 cites

Randomized Prior Functions for Deep Reinforcement Learning

Ian Osband, John Aslanides, Albin Cassirer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for better uncertainty estimation in deep reinforcement learning by adding randomized prior functions to ensemble members, improving scalability and effectiveness over previous approaches.

Contribution

It proposes a simple, scalable method using randomized prior functions for uncertainty estimation in deep RL, addressing limitations of existing techniques.

Findings

01

The method is theoretically efficient with linear models.

02

Demonstrates effectiveness with nonlinear models.

03

Scales better to large problems than previous methods.

Abstract

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

johannah/bootstrap_dqn
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms