Two-stage Least Squares with Clustered Data under the Local Average Treatment Effect Framework
Anqi Zhao, Peng Ding, Fan Li

TL;DR
This paper compares two-stage least squares methods for clustered data within the LATE framework, analyzing their validity, efficiency, and heterogeneity detection.
Contribution
It provides theoretical insights into when each 2sls approach is valid and efficient, and introduces a test for cluster heterogeneity.
Findings
Both methods yield valid inference when clusters are homogeneous.
2sfe is more efficient when cluster effects dominate idiosyncratic variation.
2sfe estimates a weighted average of cluster-specific LATEs in heterogeneous clusters.
Abstract
To estimate the causal effect of an endogenous treatment using clustered data, the canonical two-stage least squares (2sls) estimates a linear regression of the outcome on treatment status using an instrumental variable (IV) and conducts inference with cluster-robust standard errors. When both the treatment and the IV vary within clusters, an alternative two-stage least squares with fixed effects (2sfe) additionally includes cluster indicators in the regression, thereby incorporating cluster information into point estimation as well. This paper studies the trade-off between these approaches within the local average treatment effect (LATE) framework. When clusters are homogeneous, we show that both approaches yield valid large-sample inference for the LATE, and that 2sfe is more efficient than canonical 2sls only when the variation in cluster-specific effects dominates idiosyncratic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
