Off-policy estimation with adaptively collected data: the power of online learning
Jeonghwan Lee, Cong Ma

TL;DR
This paper develops non-asymptotic bounds and an online learning framework for off-policy estimation of treatment effects using adaptively collected data, applicable to contextual bandits and causal inference.
Contribution
It introduces a generic upper bound for AIPW estimators' mean-squared error and a reduction scheme leveraging online learning for adaptive treatment effect estimation.
Findings
Established bounds depend on sequential error between treatment effect and estimates.
Proposed a reduction scheme using online learning to improve estimation accuracy.
Demonstrated instance-dependent optimality of AIPW estimators with no-regret algorithms.
Abstract
We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation of the average treatment effect (\textsf{ATE}) in causal inference. While a certain class of augmented inverse propensity weighting (\textsf{AIPW}) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEconomic Policies and Impacts · Climate Change Policy and Economics
