Learning causal effects from many randomized experiments using regularized instrumental variables
Alexander Peysakhovich, Dean Eckles

TL;DR
This paper introduces a regularized instrumental variables approach with a novel cross-validation method to accurately estimate causal effects from large collections of randomized experiments, especially when effects are small and metadata is lacking.
Contribution
It proposes a sparsity-inducing l0 regularization for IV estimation and a new cross-validation method (IVCV) that uses summary statistics, improving causal inference in large experiment collections.
Findings
Regular two-stage least squares is biased with many experiments.
L0 regularization reduces bias and improves interventional predictions.
IVCV enables regularization parameter selection using only summary statistics.
Abstract
Scientific and business practices are increasingly resulting in large collections of randomized experiments. Analyzed together, these collections can tell us things that individual experiments in the collection cannot. We study how to learn causal relationships between variables from the kinds of collections faced by modern data scientists: the number of experiments is large, many experiments have very small effects, and the analyst lacks metadata (e.g., descriptions of the interventions). Here we use experimental groups as instrumental variables (IV) and show that a standard method (two-stage least squares) is biased even when the number of experiments is infinite. We show how a sparsity-inducing l0 regularization can --- in a reversal of the standard bias--variance tradeoff in regularization --- reduce bias (and thus error) of interventional predictions. Because we are interested in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
