Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models
Jason W. Rocks, Pankaj Mehta

TL;DR
This paper uses statistical physics to analyze bias and variance in over-parameterized models, revealing phase transitions and challenging classical bias-variance trade-off assumptions in deep learning.
Contribution
It provides analytic expressions for bias and variance in minimal over-parameterized models, elucidating their behavior and generalization properties beyond classical theory.
Findings
Increasing parameters causes a phase transition with zero training error and diverging test error due to variance.
In over-parameterized neural networks, test error decreases as both bias and variance decrease.
Overfitting can occur without noise, and bias persists even when student and teacher models match.
Abstract
The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly believed that optimal performance is achieved at intermediate model complexities which strike a balance between bias and variance. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance using "over-parameterized models" where the number of fit parameters is large enough to perfectly fit the training data. As a result, understanding bias and variance in over-parameterized models has emerged as a fundamental problem in machine learning. Here, we use methods from statistical physics to derive analytic expressions for bias and variance in two minimal models of over-parameterization (linear regression and two-layer neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
