On Policy Evaluation Algorithms in Distributional Reinforcement Learning

Julian Gerstenberg; Ralph Neininger; Denis Spiegel

arXiv:2407.14175·stat.ML·July 22, 2024

On Policy Evaluation Algorithms in Distributional Reinforcement Learning

Julian Gerstenberg, Ralph Neininger, Denis Spiegel

PDF

TL;DR

This paper presents a new class of algorithms for efficiently approximating return distributions in distributional reinforcement learning, capable of handling complex reward mechanisms and providing theoretical error bounds.

Contribution

It introduces distributional dynamic programming algorithms with error bounds and density approximation, applicable to a wide range of MDPs with heavy-tailed rewards.

Findings

01

Algorithms provide error bounds in Wasserstein and Kolmogorov--Smirnov distances.

02

Density approximation algorithms yield bounds in supremum norm.

03

Quantile-spline discretizations show promising simulation results.

Abstract

We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL). The proposed distributional dynamic programming algorithms are suitable for underlying Markov decision processes (MDPs) having an arbitrary probabilistic reward mechanism, including continuous reward distributions with unbounded support being potentially heavy-tailed. For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances. Furthermore, for return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm. We introduce the concept of quantile-spline discretizations to come up with algorithms showing promising results in simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.