OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates   of Multiple Estimators

Allen Nie; Yash Chandak; Christina J. Yuan; Anirudhan Badrinath,; Yannis Flet-Berliac; Emma Brunskil

arXiv:2405.17708·cs.LG·November 4, 2024

OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

Allen Nie, Yash Chandak, Christina J. Yuan, Anirudhan Badrinath,, Yannis Flet-Berliac, Emma Brunskil

PDF

Open Access

TL;DR

This paper introduces OPERA, an adaptive, estimator-agnostic method for offline policy evaluation that combines multiple estimators without explicit selection, ensuring consistency and improving policy selection in healthcare and robotics.

Contribution

The paper proposes a novel, adaptive blending algorithm for OPE that does not require explicit estimator selection and is proven to be consistent and reliable.

Findings

01

Outperforms existing methods in healthcare policy evaluation

02

Effective in robotics decision-making tasks

03

Ensures consistency and desirable properties in policy evaluation

Abstract

Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been proposed in the last decade, many of which have hyperparameters and require training. Unfortunately, choosing the best OPE algorithm for each task and domain is still unclear. In this paper, we propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure. We prove that our estimator is consistent and satisfies several desirable properties for policy evaluation. Additionally, we demonstrate that when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques

MethodsSparse Evolutionary Training