Off-policy estimation with adaptively collected data: the power of   online learning

Jeonghwan Lee; Cong Ma

arXiv:2411.12786·stat.ML·November 21, 2024

Off-policy estimation with adaptively collected data: the power of online learning

Jeonghwan Lee, Cong Ma

PDF

Open Access 1 Video

TL;DR

This paper develops non-asymptotic bounds and an online learning framework for off-policy estimation of treatment effects using adaptively collected data, applicable to contextual bandits and causal inference.

Contribution

It introduces a generic upper bound for AIPW estimators' mean-squared error and a reduction scheme leveraging online learning for adaptive treatment effect estimation.

Findings

01

Established bounds depend on sequential error between treatment effect and estimates.

02

Proposed a reduction scheme using online learning to improve estimation accuracy.

03

Demonstrated instance-dependent optimality of AIPW estimators with no-regret algorithms.

Abstract

We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation of the average treatment effect (\textsf{ATE}) in causal inference. While a certain class of augmented inverse propensity weighting (\textsf{AIPW}) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Off-policy estimation with adaptively collected data: the power of online learning· slideslive

Taxonomy

TopicsEconomic Policies and Impacts · Climate Change Policy and Economics