Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models
Cyrill Scheidegger, Malte Londschien, and Peter B\"uhlmann

TL;DR
This paper introduces a machine learning-based specification test for linear IV models that works in both overidentified and just-identified cases, using residual prediction and sample splitting.
Contribution
It develops a novel, flexible testing method applicable to various IV settings, including weak and many instruments, with implementation in R and Python packages.
Findings
The test controls type I error asymptotically.
It is consistent against broad alternatives.
The method extends to weak and many instruments settings.
Abstract
The linear instrumental variable (IV) model is widely used in observational studies, yet its validity hinges on strong assumptions. Classical specification tests such as the Sargan-Hansen J test are limited to overidentified settings and are therefore not applicable in the common just-identified case, where the number of instruments is equal to the number of endogenous variables. We propose a novel test for the well-specification of the linear IV model under the assumption that the structural error is mean independent of the instruments. This assumption enables specification testing even in the just-identified setting. Our approach uses the idea of residual prediction: if the two-stage least squares residuals can be predicted from the instruments better than chance, this indicates misspecification. The resulting test employs sample splitting and a user-chosen machine learning method,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
