Exploration with Multi-Sample Target Values for Distributional   Reinforcement Learning

Michael Teng; Michiel van de Panne; Frank Wood

arXiv:2202.02693·cs.LG·February 8, 2022

Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning

Michael Teng, Michiel van de Panne, Frank Wood

PDF

Open Access

TL;DR

This paper introduces multi-sample target values for distributional reinforcement learning, improving distribution estimates and exploration, leading to state-of-the-art results in continuous control tasks.

Contribution

It proposes multi-sample target values and UCB-based exploration for distributional RL, resulting in the new E2DC algorithm with superior performance.

Findings

01

Achieves state-of-the-art results on Humanoid control

02

Demonstrates improved distribution estimates during training

03

Provides insights through visualization of learned distributions

Abstract

Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation, as commonly employed in current practice. The improved distributional estimates further lend themselves to UCB-based exploration. These two ideas are combined to yield our distributional RL algorithm, E2DC (Extra Exploration with Distributional Critics). We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics