Optimal ridge penalty for real-world high-dimensional data can be zero or negative due to the implicit ridge regularization
Dmitry Kobak, Jonathan Lomond, Benoit Sanchez

TL;DR
This paper reveals that in high-dimensional linear regression, the optimal ridge penalty can be zero or negative due to implicit regularization effects, challenging conventional wisdom about regularization necessity.
Contribution
It demonstrates through theory and experiments that negative ridge penalties can be optimal in high-dimensional settings, and connects implicit regularization with random covariates.
Findings
Optimal ridge penalty can be zero or negative in high-dimensional data.
Implicit regularization from low-variance directions can negate the need for positive ridge penalties.
Theoretical proof using spiked covariance model shows negative optimal penalty when n << p.
Abstract
A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. Here we show that this rule can be violated by linear regression in the underdetermined situation under realistic conditions. Using simulations and real-life high-dimensional data sets, we demonstrate that an explicit positive ridge penalty can fail to provide any improvement over the minimum-norm least squares estimator. Moreover, the optimal value of ridge penalty in this situation can be negative. This happens when the high-variance directions in the predictor space can predict the response variable, which is often the case in the real-world high-dimensional data. In this regime, low-variance directions provide an implicit ridge regularization and can make any further positive ridge penalty detrimental. We prove that augmenting any linear model with random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference
MethodsLinear Regression
