Memorizing without overfitting: Bias, variance, and interpolation in   over-parameterized models

Jason W. Rocks; Pankaj Mehta

arXiv:2010.13933·stat.ML·March 25, 2022

Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models

Jason W. Rocks, Pankaj Mehta

PDF

TL;DR

This paper uses statistical physics to analyze bias and variance in over-parameterized models, revealing phase transitions and challenging classical bias-variance trade-off assumptions in deep learning.

Contribution

It provides analytic expressions for bias and variance in minimal over-parameterized models, elucidating their behavior and generalization properties beyond classical theory.

Findings

01

Increasing parameters causes a phase transition with zero training error and diverging test error due to variance.

02

In over-parameterized neural networks, test error decreases as both bias and variance decrease.

03

Overfitting can occur without noise, and bias persists even when student and teacher models match.

Abstract

The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly believed that optimal performance is achieved at intermediate model complexities which strike a balance between bias and variance. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance using "over-parameterized models" where the number of fit parameters is large enough to perfectly fit the training data. As a result, understanding bias and variance in over-parameterized models has emerged as a fundamental problem in machine learning. Here, we use methods from statistical physics to derive analytic expressions for bias and variance in two minimal models of over-parameterization (linear regression and two-layer neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.