A Unifying Framework for Robust and Efficient Inference with Unstructured Data

Jacob Carlson; Melissa Dell

arXiv:2505.00282·econ.EM·February 20, 2026

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

Jacob Carlson, Melissa Dell

PDF

TL;DR

This paper introduces MAR-S, a semiparametric framework that corrects neural network prediction errors for unbiased, efficient inference with unstructured data like text, images, and audio, addressing biases, reproducibility, and robustness issues.

Contribution

The study develops MAR-S, a novel framework that extends debiased inference methods to unstructured data, enabling robust and efficient analysis despite neural network prediction errors.

Findings

01

Provides unbiased estimators for causal and descriptive analysis.

02

Addresses inference with aggregated neural network predictions.

03

Connects machine learning prediction correction to causal inference principles.

Abstract

To analyze unstructured data (text, images, audio, video), economists typically first extract low-dimensional structured features with a neural network. Neural networks do not make generically unbiased predictions, and biases will propagate to estimators that use their predictions. While structured variables extracted from unstructured data have traditionally been treated as proxies - implicitly accepting arbitrary measurement error - this poses various challenges in an era where constantly evolving AI can cheaply extract data. Researcher degrees of freedom (e.g., the choice of neural network architecture, training data or prompts, and numerous implementation details) raise concerns about p-hacking and how to best show robustness, the frequent deprecation of proprietary neural networks complicates reproducibility, and researchers need a principled way to determine how accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need