Linear Frequency Principle Model to Understand the Absence of   Overfitting in Neural Networks

Yaoyu Zhang; Tao Luo; Zheng Ma; and Zhi-Qin John Xu

arXiv:2102.00200·cs.LG·May 26, 2021

Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks

Yaoyu Zhang, Tao Luo, Zheng Ma, and Zhi-Qin John Xu

PDF

TL;DR

This paper introduces a linear frequency principle model to explain why heavily parameterized neural networks avoid overfitting, emphasizing the importance of low-frequency learning in the training dynamics.

Contribution

It proposes a phenomenological LFP model capturing neural networks' tendency to learn low frequencies first, explaining non-overfitting behavior and connecting microscopic training dynamics to macroscopic frequency patterns.

Findings

01

Low frequency dominance is crucial for non-overfitting.

02

The LFP model accurately predicts training dynamics.

03

Experiments verify the model's key predictions.

Abstract

Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question. We propose a phenomenological model of the NN training to explain this non-overfitting puzzle. Our linear frequency principle (LFP) model accounts for a key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic details. Theory based on our LFP model shows that low frequency dominance of target functions is the key condition for the non-overfitting of NNs and is verified by experiments. Furthermore, through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to a LFP model with quantitative prediction power.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.