Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Connor Lennon; Edward Rubin; Glen Waddell

arXiv:2505.13422·econ.EM·May 20, 2025

Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Connor Lennon, Edward Rubin, Glen Waddell

PDF

Open Access

TL;DR

This paper explores the use of machine learning in the first stage of 2SLS estimation, decomposing bias and providing practical guidance through simulations to identify when ML improves or worsens causal inference.

Contribution

It offers a bias decomposition framework and simulation results to guide the application of ML methods in 2SLS, highlighting when linear or nonlinear ML methods are beneficial.

Findings

01

Linear ML methods perform well in 2SLS

02

Nonlinear ML methods can introduce substantial bias

03

Bias can sometimes exceed that of endogenous OLS

Abstract

Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLS $\unicode x 2014$ or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings $\unicode x 2014$ and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimates $\unicode x 2014$ potentially exceeding the bias of endogenous OLS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks