Estimation and Inference in Distributional Reinforcement Learning

Liangyu Zhang; Yang Peng; Jiadong Liang; Wenhao Yang; Zhihua Zhang

arXiv:2309.17262·stat.ML·November 13, 2025

Estimation and Inference in Distributional Reinforcement Learning

Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang

PDF

1 Repo

TL;DR

This paper analyzes the statistical efficiency of distributional reinforcement learning, providing sample complexity bounds and asymptotic behavior of estimators for the return distribution under various metrics.

Contribution

It introduces a sample-efficient estimator for return distributions and studies its asymptotic properties, advancing the theoretical understanding of distributional RL.

Findings

01

Sample complexity bounds for Wasserstein, Kolmogorov, and total variation metrics.

02

Weak convergence of the estimator to a Gaussian process.

03

Unified approach for statistical inference of distributional RL.

Abstract

In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted $η^{π}$ ) attained by a given policy $π$ . We use the certainty-equivalence method to construct our estimator $\overset{η}{^}^{π}$ , given a generative model is available. In this circumstance we need a dataset of size $O (\frac{∣ S ∣∣ A ∣}{ε ^{2 p} ( 1 - γ ) ^{2 p + 2}})$ to guarantee the $p$ -Wasserstein metric between $\overset{η}{^}^{π}$ and $η^{π}$ less than $ε$ with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangliangyu32/estimationandinferencedistributionalrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGaussian Process