Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability
Masahiro Kato

TL;DR
This paper introduces an adaptive doubly robust estimator for causal inference in non-stationary adaptive experiments, allowing for fluctuating logging policies that converge on average, and demonstrates its asymptotic normality and empirical effectiveness.
Contribution
It proposes a novel assumption that the average logging policy converges, enabling the DR estimator to handle non-stationary policies in adaptive experiments.
Findings
The estimator is asymptotically normal under the new assumption.
Simulation results confirm the empirical effectiveness of the proposed method.
Abstract
Adaptive experiments, including efficient average treatment effect estimation and multi-armed bandit algorithms, have garnered attention in various applications, such as social experiments, clinical trials, and online advertisement optimization. This paper considers estimating the mean outcome of an action from samples obtained in adaptive experiments. In causal inference, the mean outcome of an action has a crucial role, and the estimation is an essential task, where the average treatment effect estimation and off-policy value estimation are its variants. In adaptive experiments, the probability of choosing an action (logging policy) is allowed to be sequentially updated based on past observations. Due to this logging policy depending on the past observations, the samples are often not independent and identically distributed (i.i.d.), making developing an asymptotically normal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Statistical Methods and Inference
