Distributional Offline Policy Evaluation with Predictive Error   Guarantees

Runzhe Wu; Masatoshi Uehara; Wen Sun

arXiv:2302.09456·cs.LG·January 1, 2024

Distributional Offline Policy Evaluation with Predictive Error Guarantees

Runzhe Wu, Masatoshi Uehara, Wen Sun

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Fitted Likelihood Estimation (FLE), an algorithm for distributional offline policy evaluation that leverages probabilistic generative models and provides theoretical guarantees on distribution closeness.

Contribution

The paper proposes FLE, a flexible MLE-based method for distributional offline policy evaluation with theoretical guarantees and practical validation using advanced generative models.

Findings

01

FLE achieves close distributional estimates under total variation and Wasserstein distances.

02

FLE with diffusion models effectively estimates complex multi-dimensional reward distributions.

03

Theoretical guarantees depend on data coverage and successful MLE training.

Abstract

We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE). We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilistic generative models as long as it can be trained via MLE. FLE can be used for both finite-horizon and infinite-horizon discounted settings where rewards can be multi-dimensional vectors. Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively. Our theoretical results hold under the conditions that the offline data covers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziqian2000/fitted-likelihood-estimation
pytorchOfficial

Videos

Distributional Offline Policy Evaluation with Predictive Error Guarantees· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference

MethodsTest · Diffusion