Deep frequency principle towards understanding why deeper learning is faster
Zhi-Qin John Xu, Hanxu Zhou

TL;DR
This paper uses Fourier analysis to empirically demonstrate that deeper neural networks bias towards learning lower frequency functions faster, explaining why deeper models often train more efficiently.
Contribution
It introduces the deep frequency principle, showing how depth biases neural networks towards lower frequency functions, providing a new empirical explanation for faster training in deeper networks.
Findings
Deeper layers bias towards lower frequency functions during training.
Deeper networks learn lower frequency components faster.
Empirical evidence supports the deep frequency principle.
Abstract
Understanding the effect of depth in deep learning is a critical problem. In this work, we utilize the Fourier analysis to empirically provide a promising mechanism to understand why feedforward deeper learning is faster. To this end, we separate a deep neural network, trained by normal stochastic gradient descent, into two parts during analysis, i.e., a pre-condition component and a learning component, in which the output of the pre-condition one is the input of the learning one. We use a filtering method to characterize the frequency distribution of a high-dimensional function. Based on experiments of deep networks and real dataset, we propose a deep frequency principle, that is, the effective target function for a deeper hidden layer biases towards lower frequency during the training. Therefore, the learning component effectively learns a lower frequency function if the pre-condition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Adversarial Robustness in Machine Learning
