On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie, Qian-Yuan Tang, Yunfeng Cai, Mingming Sun, and Ping Li

TL;DR
This paper reveals that the Hessian spectra of well-trained deep neural networks follow a power-law distribution, providing a theoretical explanation and demonstrating its usefulness in understanding deep learning behaviors.
Contribution
First to demonstrate the power-law structure in Hessian spectra of deep neural networks and provide a maximum-entropy theoretical explanation.
Findings
Hessian spectra exhibit power-law distributions in well-trained networks.
The power-law structure helps explain spectral behaviors in deep learning.
Spectral analysis reveals parallels between protein evolution and neural network training.
Abstract
It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Protein Structure and Dynamics · Statistical Mechanics and Entropy
