Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks
Amanda Olmin, Fredrik Lindsten

TL;DR
This paper investigates the phenomenon of epoch-wise double descent in two-layer linear neural networks, providing theoretical insights into its mechanisms and identifying key factors influencing this behavior beyond linear regression.
Contribution
It derives a gradient flow model for two-layer linear networks and identifies new factors affecting double descent, extending understanding beyond simple linear regression models.
Findings
Double descent is influenced by input-output covariance matrix singular values.
Gradient flow bridges linear regression and two-layer network dynamics.
Additional factors for double descent emerge with the extra layer.
Abstract
Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms driving this behaviour is crucial not only for understanding the generalisation behaviour of machine learning models in general, but also for employing conventional selection methods, such as the use of early stopping to mitigate overfitting. While we ultimately want to draw conclusions of more complex models, such as deep neural networks, a majority of theoretical results regarding the underlying cause of epoch-wise double descent are based on simple models, such as standard linear regression. In this paper, to take a step towards more complex models in theoretical analysis, we study epoch-wise double descent in two-layer linear neural networks. First,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing
MethodsLinear Regression · Early Stopping
