A Principled Path to Fitted Distributional Evaluation
Sungee Hong, Jiayi Wang, Zhengling Qi, Raymond K. W. Wong

TL;DR
This paper extends fitted Q-evaluation to distributional off-policy evaluation in reinforcement learning, providing a unified framework, new methods, and theoretical analysis, with empirical validation on diverse environments.
Contribution
It introduces fitted distributional evaluation (FDE), a principled framework for distributional OPE, along with new methods and convergence guarantees.
Findings
FDE methods outperform existing approaches in experiments.
Theoretical convergence guarantees are established for FDE.
FDE demonstrates superior performance in Atari and LQR environments.
Abstract
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted Q-evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributional evaluation (FDE). While only a few related approaches exist, there remains no unified framework for designing FDE methods. To fill this gap, we present a set of guiding principles for constructing theoretically grounded FDE methods. Building on these principles, we develop several new FDE methods with convergence analysis and provide theoretical justification for existing methods, even in non-tabular environments. Extensive experiments, including simulations on linear quadratic regulators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEvaluation and Performance Assessment
MethodsSparse Evolutionary Training
