Long-Term Causal Inference with Many Noisy Proxies
Apoorva Lal, Guido Imbens, Peter Hull

TL;DR
This paper introduces a method for estimating long-term treatment effects using many noisy short-term proxies, formalizing it as a latent variable problem and demonstrating the effectiveness of regularized regression techniques.
Contribution
The paper formalizes the challenge of long-term causal inference with noisy proxies and shows how regularized regression improves bias and variance tradeoffs in this setting.
Findings
Regularized regression methods outperform naive proxy selection.
Bias of Ridge regression decreases with more proxies, with explicit bias-variance tradeoff expressions.
Empirical application to the California GAIN experiment demonstrates practical effectiveness.
Abstract
We propose a method for estimating long-term treatment effects with many short-term proxy outcomes: a central challenge when experimenting on digital platforms. We formalize this challenge as a latent variable problem where observed proxies are noisy measures of a low-dimensional set of unobserved surrogates that mediate treatment effects. Through theoretical analysis and simulations, we demonstrate that regularized regression methods substantially outperform naive proxy selection. We show in particular that the bias of Ridge regression decreases as more proxies are added, with closed-form expressions for the bias-variance tradeoff. We illustrate our method with an empirical application to the California GAIN experiment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Consumer Market Behavior and Pricing · Advanced Bandit Algorithms Research
