Linear Regressions with Combined Data

Xavier D'Haultfoeuille; Christophe Gaillac; Arnaud Maurel

arXiv:2412.04816·econ.EM·November 18, 2025

Linear Regressions with Combined Data

Xavier D'Haultfoeuille, Christophe Gaillac, Arnaud Maurel

PDF

Open Access

TL;DR

This paper develops a method for partially identifying linear regression coefficients when outcome and covariates are observed in separate datasets without matching, providing sharp bounds and estimators, with applications to racial disparities and educational performance.

Contribution

It introduces a novel approach to partial identification in linear regressions with separate datasets, relaxing exclusion restrictions and providing computationally simple estimators.

Findings

01

Derived sharp bounds for regression coefficients without exclusion restrictions.

02

Developed asymptotically normal estimators for the bounds.

03

Applied methodology to real-world datasets on patent approval and education.

Abstract

We study linear regressions in a context where the outcome of interest and some of the covariates are observed in two different datasets that cannot be matched. Traditional approaches obtain point identification by relying, often implicitly, on exclusion restrictions. We show that without such restrictions, coefficients of interest can still be partially identified, with the sharp bounds taking a simple form. We obtain tighter bounds when variables observed in both datasets, but not included in the regression of interest, are available, even if these variables are not subject to specific restrictions. We develop computationally simple and asymptotically normal estimators of the bounds. Finally, we apply our methodology to estimate racial disparities in patent approval rates and to evaluate the effect of patience and risk-taking on educational performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models