Towards Understanding Epoch-wise Double descent in Two-layer Linear   Neural Networks

Amanda Olmin; Fredrik Lindsten

arXiv:2407.09845·stat.ML·September 20, 2024

Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks

Amanda Olmin, Fredrik Lindsten

PDF

Open Access

TL;DR

This paper investigates the phenomenon of epoch-wise double descent in two-layer linear neural networks, providing theoretical insights into its mechanisms and identifying key factors influencing this behavior beyond linear regression.

Contribution

It derives a gradient flow model for two-layer linear networks and identifies new factors affecting double descent, extending understanding beyond simple linear regression models.

Findings

01

Double descent is influenced by input-output covariance matrix singular values.

02

Gradient flow bridges linear regression and two-layer network dynamics.

03

Additional factors for double descent emerge with the extra layer.

Abstract

Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms driving this behaviour is crucial not only for understanding the generalisation behaviour of machine learning models in general, but also for employing conventional selection methods, such as the use of early stopping to mitigate overfitting. While we ultimately want to draw conclusions of more complex models, such as deep neural networks, a majority of theoretical results regarding the underlying cause of epoch-wise double descent are based on simple models, such as standard linear regression. In this paper, to take a step towards more complex models in theoretical analysis, we study epoch-wise double descent in two-layer linear neural networks. First,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing

MethodsLinear Regression · Early Stopping