Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition
Ben Adlam, Jeffrey Pennington

TL;DR
This paper introduces a detailed bias-variance decomposition for deep learning models, revealing complex behaviors of variance components and their divergence at the interpolation boundary, which can be mitigated by ensemble methods.
Contribution
It provides a novel, interpretable bias-variance decomposition accounting for multiple sources of randomness and analyzes its high-dimensional asymptotics in kernel regression.
Findings
Bias decreases monotonically with network width.
Variance terms show non-monotonic behavior and can diverge.
Divergence caused by interaction of sampling and initialization, mitigated by ensemble methods.
Abstract
Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function. However, such a simple trade-off does not adequately describe deep learning models that simultaneously attain low bias and variance in the heavily overparameterized regime. A primary obstacle in explaining this behavior is that deep learning algorithms typically involve multiple sources of randomness whose individual contributions are not visible in the total variance. To enable fine-grained analysis, we describe an interpretable, symmetric decomposition of the variance into terms associated with the randomness from sampling, initialization, and the labels. Moreover, we compute the high-dimensional asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
