Understanding Double Descent Requires a Fine-Grained Bias-Variance   Decomposition

Ben Adlam; Jeffrey Pennington

arXiv:2011.03321·stat.ML·November 9, 2020·22 cites

Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition

Ben Adlam, Jeffrey Pennington

PDF

Open Access 1 Video

TL;DR

This paper introduces a detailed bias-variance decomposition for deep learning models, revealing complex behaviors of variance components and their divergence at the interpolation boundary, which can be mitigated by ensemble methods.

Contribution

It provides a novel, interpretable bias-variance decomposition accounting for multiple sources of randomness and analyzes its high-dimensional asymptotics in kernel regression.

Findings

01

Bias decreases monotonically with network width.

02

Variance terms show non-monotonic behavior and can diverge.

03

Divergence caused by interaction of sampling and initialization, mitigated by ensemble methods.

Abstract

Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function. However, such a simple trade-off does not adequately describe deep learning models that simultaneously attain low bias and variance in the heavily overparameterized regime. A primary obstacle in explaining this behavior is that deep learning algorithms typically involve multiple sources of randomness whose individual contributions are not visible in the total variance. To enable fine-grained analysis, we describe an interpretable, symmetric decomposition of the variance into terms associated with the randomness from sampling, initialization, and the labels. Moreover, we compute the high-dimensional asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition· slideslive

Taxonomy

TopicsMachine Learning in Materials Science · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification