OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators
Allen Nie, Yash Chandak, Christina J. Yuan, Anirudhan Badrinath,, Yannis Flet-Berliac, Emma Brunskil

TL;DR
This paper introduces OPERA, an adaptive, estimator-agnostic method for offline policy evaluation that combines multiple estimators without explicit selection, ensuring consistency and improving policy selection in healthcare and robotics.
Contribution
The paper proposes a novel, adaptive blending algorithm for OPE that does not require explicit estimator selection and is proven to be consistent and reliable.
Findings
Outperforms existing methods in healthcare policy evaluation
Effective in robotics decision-making tasks
Ensures consistency and desirable properties in policy evaluation
Abstract
Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been proposed in the last decade, many of which have hyperparameters and require training. Unfortunately, choosing the best OPE algorithm for each task and domain is still unclear. In this paper, we propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure. We prove that our estimator is consistent and satisfies several desirable properties for policy evaluation. Additionally, we demonstrate that when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
MethodsSparse Evolutionary Training
