TL;DR
This paper introduces Bayesian deep Q-networks (BDQN), a method that improves exploration in high-dimensional reinforcement learning by incorporating uncertainty through Bayesian linear regression and Thompson sampling, leading to faster learning.
Contribution
The paper develops a novel Bayesian approach for deep Q-networks that effectively models uncertainty and enhances exploration in high-dimensional RL tasks.
Findings
BDQN outperforms standard DQN in Atari games.
Efficient exploration accelerates learning speed.
Theoretical regret bounds are established for the proposed algorithms.
Abstract
We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d sqrt{T}) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsExperience Replay · Double Q-learning · Q-Learning · Double DQN · Dense Connections · Convolution · Deep Q-Network
