Reconciling modern machine learning practice and the bias-variance trade-off
Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

TL;DR
This paper introduces the double descent phenomenon, reconciling classical bias-variance trade-off theory with modern practice where highly complex models like neural networks interpolate data yet still achieve high accuracy.
Contribution
It presents a unified performance curve called double descent that explains how increasing model capacity beyond interpolation improves accuracy, challenging classical bias-variance understanding.
Findings
Double descent is observed across various models and datasets.
Increasing model capacity beyond the interpolation point can reduce test error.
Classical bias-variance trade-off does not fully explain modern model behavior.
Abstract
Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Reconciling modern machine learning and the bias-variance trade-off· youtube
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
