Training behavior of deep neural network in frequency domain
Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao

TL;DR
This paper investigates the training dynamics of deep neural networks, revealing a frequency-based learning pattern where low-frequency components are learned faster than high-frequency ones, which helps explain generalization.
Contribution
The study introduces the Frequency Principle (F-Principle), a novel empirical observation that DNNs learn low-frequency components first across various architectures and training methods.
Findings
DNNs quickly capture low-frequency components
High-frequency components are learned more slowly
F-Principle explains early-stopping and generalization effects
Abstract
Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
