Design and Analysis Considerations for Causal Inference under Two-Phase Sampling in Observational Studies
Kazuharu Harada, Masataka Taguri

TL;DR
This paper derives the efficiency bounds for causal effect estimators under two-phase sampling and proposes methods to improve their efficiency by utilizing phase-1 information, supported by extensive simulations.
Contribution
It provides the first semiparametric efficiency bounds for weighted average treatment effects under two-phase sampling and introduces more efficient estimators leveraging phase-1 data.
Findings
Efficiency gains are substantial when incorporating phase-1 information.
The proposed estimators outperform naive methods in simulations.
Outcome-dependent sampling benefits from phase-1 data integration.
Abstract
Two-phase sampling is a simple and cost-effective estimation strategy in survey sampling and is widely used in practice. Because the phase-2 sampling probability typically depends on low-cost variables collected at phase 1, naive estimation based solely on the phase-2 sample generally results in biased inference. This issue arises even when estimating causal parameters such as the average treatment effect (ATE), and there has been growing interest in recent years in the proper estimation of such parameters under complex sampling designs (e.g., Nattino et al., 2025). In this paper, we derive the semiparametric efficiency bound for a broad class of weighted average treatment effects (WATE), which includes the ATE, the average treatment effect on the treated (ATT), and the average treatment effect on the overlapped population (ATO), under two-phase sampling. In addition to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Survey Methodology and Nonresponse · Statistical Methods and Bayesian Inference
