Regularization-wise double descent: Why it occurs and how to eliminate it
Fatih Furkan Yilmaz, Reinhard Heckel

TL;DR
This paper investigates the double descent phenomenon in the risk of regularized models, revealing it can be mitigated by appropriately tuning regularization strengths across different model parts, with experiments on linear models, neural networks, and CNNs.
Contribution
It demonstrates that double descent occurs as a function of regularization strength and proposes methods to eliminate it by adjusting regularization across model components.
Findings
Double descent occurs in regularized models as a function of regularization strength.
Proper scaling of regularization can eliminate double descent in linear and neural network models.
Experiments on CNNs and ResNet-18 confirm the phenomenon and its mitigation.
Abstract
The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be double-descent shaped, and this behavior can be explained as a super-position of bias-variance tradeoffs. In this paper, we show that the risk of explicit L2-regularized models can exhibit double descent behavior as a function of the regularization strength, both in theory and practice. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the model and can be mitigated by scaling the regularization strength of each part appropriately. Motivated by this result, we study a two-layer neural network and show that double descent can be eliminated by adjusting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
