Correlated Confounding Variables Are Not Easily Controlled for in Large Survey Research
William H. Press

TL;DR
This paper highlights the difficulty of controlling for all confounding variables in large survey data, demonstrating that residual confounding can lead to misleading associations despite extensive regression efforts.
Contribution
It provides concrete examples and formulas showing how unmeasured confounders can produce spurious correlations, challenging assumptions in survey data analysis.
Findings
Regression does not fully eliminate confounding effects.
Residual confounding can produce false associations.
Formulas quantify the impact of unmeasured confounders.
Abstract
Results in epidemiology and social science often require the removal of confounding effects from measurements of the pairwise correlation of variables in survey data. This is typically accomplished by some variant of linear regression (e.g., ``logistic" or ``Cox proportional"). But, knowing whether all possible confounders have been identified, or are even visible (not latent), is in general impossible. Here, we exhibit two examples that frame the issue. The first example proposes a highly unlikely hypothesis on drug use, draws data from a large, respected survey, and succeeds in ``proving" the implausible hypothesis, despite regressing out more than 20 confounding variables. The second constructs a ``metamodel" in which a single (by hypothesis unmeasurable) latent variable affects many mutually correlated confounders. From simulations, we derive formulas for the magnitude of spurious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Bayesian Inference · Survey Methodology and Nonresponse
