A Unifying Framework for Robust and Efficient Inference with Unstructured Data
Jacob Carlson, Melissa Dell

TL;DR
This paper introduces MAR-S, a semiparametric framework that corrects neural network prediction errors for unbiased, efficient inference with unstructured data like text, images, and audio, addressing biases, reproducibility, and robustness issues.
Contribution
The study develops MAR-S, a novel framework that extends debiased inference methods to unstructured data, enabling robust and efficient analysis despite neural network prediction errors.
Findings
Provides unbiased estimators for causal and descriptive analysis.
Addresses inference with aggregated neural network predictions.
Connects machine learning prediction correction to causal inference principles.
Abstract
To analyze unstructured data (text, images, audio, video), economists typically first extract low-dimensional structured features with a neural network. Neural networks do not make generically unbiased predictions, and biases will propagate to estimators that use their predictions. While structured variables extracted from unstructured data have traditionally been treated as proxies - implicitly accepting arbitrary measurement error - this poses various challenges in an era where constantly evolving AI can cheaply extract data. Researcher degrees of freedom (e.g., the choice of neural network architecture, training data or prompts, and numerous implementation details) raise concerns about p-hacking and how to best show robustness, the frequent deprecation of proprietary neural networks complicates reproducibility, and researchers need a principled way to determine how accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
