Multi-scale Feature Learning Dynamics: Insights for Double Descent

Mohammad Pezeshki; Amartya Mitra; Yoshua Bengio; Guillaume Lajoie

arXiv:2112.03215·cs.LG·December 7, 2021·6 cites

Multi-scale Feature Learning Dynamics: Insights for Double Descent

Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie

PDF

Open Access 1 Repo

TL;DR

This paper investigates the epoch-wise double descent phenomenon in deep learning, revealing that different features are learned at different times, which explains the non-monotonous test error behavior during training.

Contribution

It provides a theoretical analysis using statistical physics tools to explain epoch-wise double descent and validates findings with numerical experiments and deep neural network observations.

Findings

01

Double descent arises from features learned at different scales.

02

Slower-learning features cause the second descent in test error.

03

Theory accurately predicts empirical and neural network behaviors.

Abstract

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nndoubledescent/doubledescent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference