Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization
Simone Bombari, Marco Mondelli

TL;DR
This paper analyzes how regularization, simplicity bias, and over-parameterization influence spurious correlations in high-dimensional regression, providing a statistical framework and empirical validation across multiple datasets.
Contribution
It offers a theoretical characterization of spurious correlations in high-dimensional regression and explores the effects of regularization and over-parameterization on these correlations.
Findings
Spurious correlation magnitude depends on data covariance and regularization strength.
Optimal test loss occurs where spurious correlations are increasing.
Over-parameterization via random features mimics regularized linear regression effects.
Abstract
Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature and a spurious feature . Specifically, we quantify the amount of spurious correlations learned via linear regression, in terms of the data covariance and the strength of the ridge regularization. As a consequence, we first capture the simplicity of through the spectrum of its covariance, and its correlation with through the Schur complement of the full data covariance. Next, we prove a trade-off between and the in-distribution test loss , by showing that the value of that minimizes …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Methods and Inference
