Bayesian Distributional Policy Gradients

Luchen Li; A. Aldo Faisal

arXiv:2103.11265·cs.LG·March 24, 2021

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

PDF

Open Access 1 Video

TL;DR

This paper introduces BDPG, a novel distributional RL algorithm that models state-return distributions, enabling better exploration and faster learning, demonstrated on Atari and MuJoCo benchmarks.

Contribution

It models state-return distributions and uses adversarial training to estimate return uncertainties, integrating curiosity-driven exploration into distributional RL.

Findings

01

BDPG learns faster than existing algorithms.

02

Achieves higher asymptotic performance.

03

Effective in hard-exploration tasks.

Abstract

Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bellman operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model return distributions. The proposed algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bayesian Distributional Policy Gradients· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research