A Cram\'er Distance perspective on Quantile Regression based   Distributional Reinforcement Learning

Alix Lh\'eritier; Nicolas Bondoux

arXiv:2110.00535·stat.ML·February 23, 2022

A Cram\'er Distance perspective on Quantile Regression based Distributional Reinforcement Learning

Alix Lh\'eritier, Nicolas Bondoux

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of the Cramér distance in quantile regression for distributional reinforcement learning, revealing theoretical connections to the Wasserstein distance and proposing an efficient computation method.

Contribution

It proves the equivalence of Cramér and Wasserstein projections and introduces a low complexity algorithm for Cramér distance computation in DRL.

Findings

01

Cramér distance projection coincides with 1-Wasserstein projection.

02

Squared Cramér and quantile regression losses have collinear gradients under non-crossing constraints.

03

Proposed algorithm efficiently computes the Cramér distance for DRL.

Abstract

Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR) based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cram\'er distance yields a projection that coincides with the 1-Wasserstein…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alherit/cr-dqn
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Sparse and Compressive Sensing Techniques