Phenomenology of Double Descent in Finite-Width Neural Networks
Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Sch\"olkopf

TL;DR
This paper investigates the double descent phenomenon in finite-width neural networks, revealing how the loss function influences generalization and connecting the behavior to the Hessian spectrum at the interpolation threshold.
Contribution
It introduces a novel influence function-based analysis that captures double descent in neural networks, considering the loss function and Hessian spectrum effects.
Findings
Double descent occurs at the interpolation threshold in finite neural networks.
The loss function significantly impacts the double descent behavior.
Hessian spectrum analysis reveals properties of neural networks near the interpolation point.
Abstract
`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components -- such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
