Efficient Exploration through Bayesian Deep Q-Networks

Kamyar Azizzadenesheli; Animashree Anandkumar

arXiv:1802.04412·cs.AI·September 10, 2019

Efficient Exploration through Bayesian Deep Q-Networks

Kamyar Azizzadenesheli, Animashree Anandkumar

PDF

1 Repo

TL;DR

This paper introduces Bayesian deep Q-networks (BDQN), a method that improves exploration in high-dimensional reinforcement learning by incorporating uncertainty through Bayesian linear regression and Thompson sampling, leading to faster learning.

Contribution

The paper develops a novel Bayesian approach for deep Q-networks that effectively models uncertainty and enhances exploration in high-dimensional RL tasks.

Findings

01

BDQN outperforms standard DQN in Atari games.

02

Efficient exploration accelerates learning speed.

03

Theoretical regret bounds are established for the proposed algorithms.

Abstract

We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d sqrt{T}) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kazizzad/BDQN-MxNet-Gluon
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExperience Replay · Double Q-learning · Q-Learning · Double DQN · Dense Connections · Convolution · Deep Q-Network