Econometric Inference with Machine-Learned Proxies: Partial Identification via Data Combination

Lixiong Li

arXiv:2604.10770·econ.EM·April 14, 2026

Econometric Inference with Machine-Learned Proxies: Partial Identification via Data Combination

Lixiong Li

PDF

TL;DR

This paper introduces a new econometric framework for inference with machine-learned proxies, leveraging data combination and optimal transport to address biases without requiring strict ML assumptions.

Contribution

It develops a partial identification approach that uses two datasets and optimal transport, avoiding the need for ML consistency or validation samples.

Findings

01

Reliable size control demonstrated in Monte Carlo simulations

02

Provides informative confidence sets across various predictive scenarios

03

Does not require restrictive assumptions on ML procedures

Abstract

Empirical researchers increasingly use upstream machine-learning (ML) methods to construct proxies for latent target variables from complex, unstructured data. A naive plug-in use of such proxies in downstream econometric models, however, can lead to biased estimation and invalid inference. This paper develops a framework for partial identification and inference in general moment models with ML-generated proxies. Our approach does not require restrictive assumptions on the upstream ML procedure, such as consistency or known convergence rates, nor does it require a complete validation sample containing all variables used in the downstream analysis. Instead, we assume access to two datasets: a downstream sample containing observed covariates and the proxy, and an auxiliary validation sample containing joint observations on the proxy and its target variable. We treat the proxy as a linking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.