Assumed Density Filtering Q-learning

Heejin Jeong; Clark Zhang; George J. Pappas; Daniel D. Lee

arXiv:1712.03333·cs.LG·October 25, 2019

Assumed Density Filtering Q-learning

Heejin Jeong, Clark Zhang, George J. Pappas, Daniel D. Lee

PDF

1 Repo

TL;DR

This paper introduces ADFQ, a Bayesian off-policy TD method using Assumed Density Filtering to update beliefs on Q-values, improving exploration, regularization, and performance in stochastic and large action space environments.

Contribution

The paper presents a novel Bayesian approach to off-policy TD learning, providing a closed-form update for Q-beliefs and extending it with neural networks for enhanced performance.

Findings

01

Outperforms comparable algorithms on Atari games

02

Shows significant improvements in stochastic domains

03

Handles large action spaces effectively

Abstract

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods, called as ADFQ, which updates beliefs on state-action values, Q, through an online Bayesian inference method known as Assumed Density Filtering. We formulate an efficient closed-form solution for the value update by approximately estimating analytic parameters of the posterior of the Q-beliefs. Uncertainty measures in the beliefs not only are used in exploration but also provide a natural regularization for the value update considering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coco66/ADFQ
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning