TL;DR
This paper analyzes the statistical efficiency of distributional reinforcement learning, providing sample complexity bounds and asymptotic behavior of estimators for the return distribution under various metrics.
Contribution
It introduces a sample-efficient estimator for return distributions and studies its asymptotic properties, advancing the theoretical understanding of distributional RL.
Findings
Sample complexity bounds for Wasserstein, Kolmogorov, and total variation metrics.
Weak convergence of the estimator to a Gaussian process.
Unified approach for statistical inference of distributional RL.
Abstract
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted ) attained by a given policy . We use the certainty-equivalence method to construct our estimator , given a generative model is available. In this circumstance we need a dataset of size to guarantee the -Wasserstein metric between and less than with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGaussian Process
