Distributional Offline Policy Evaluation with Predictive Error Guarantees
Runzhe Wu, Masatoshi Uehara, Wen Sun

TL;DR
This paper introduces Fitted Likelihood Estimation (FLE), an algorithm for distributional offline policy evaluation that leverages probabilistic generative models and provides theoretical guarantees on distribution closeness.
Contribution
The paper proposes FLE, a flexible MLE-based method for distributional offline policy evaluation with theoretical guarantees and practical validation using advanced generative models.
Findings
FLE achieves close distributional estimates under total variation and Wasserstein distances.
FLE with diffusion models effectively estimates complex multi-dimensional reward distributions.
Theoretical guarantees depend on data coverage and successful MLE training.
Abstract
We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE). We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilistic generative models as long as it can be trained via MLE. FLE can be used for both finite-horizon and infinite-horizon discounted settings where rewards can be multi-dimensional vectors. Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively. Our theoretical results hold under the conditions that the offline data covers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference
MethodsTest · Diffusion
