Doubly Robust Estimator for Off-Policy Evaluation with Large Action   Spaces

Tatsuhiro Shimizu; Laura Forastiere

arXiv:2308.03443·stat.ML·December 15, 2023

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

Tatsuhiro Shimizu, Laura Forastiere

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Marginalized Doubly Robust (MDR) estimator for off-policy evaluation in large action spaces, reducing bias and variance compared to existing methods under weaker assumptions.

Contribution

The paper proposes a novel MDR estimator that is unbiased under weaker assumptions and demonstrates superior empirical performance over existing estimators.

Findings

01

MDR estimator reduces variance compared to MIPS.

02

MDR maintains unbiasedness under weaker assumptions.

03

Empirical results show MDR outperforms existing estimators.

Abstract

We study Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces. The benchmark estimators suffer from severe bias and variance tradeoffs. Parametric approaches suffer from bias due to difficulty specifying the correct model, whereas ones with importance weight suffer from variance. To overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was proposed to mitigate the estimator's variance via embeddings of an action. Nevertheless, MIPS is unbiased under the no direct effect, which assumes that the action embedding completely mediates the effect of an action on a reward. To overcome the dependency on these unrealistic assumptions, we propose a Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the proposed estimator is unbiased under weaker assumptions than MIPS while reducing the variance against MIPS. The empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tatsu432/DR-estimator-OPE-large-action
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Advanced Causal Inference Techniques