Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong, Yushun Zhang, Jianfeng Yao, Ruoyu Sun

TL;DR
This paper investigates the origins of the near-block-diagonal structure of neural network Hessians, revealing that the number of classes significantly influences this structure, especially in large models like LLMs.
Contribution
It provides a theoretical analysis of the static component of Hessian structure at initialization, highlighting the role of class count in the block-diagonal pattern.
Findings
Hessian block-diagonal structure emerges as class number increases.
Theoretical analysis based on random matrix theory explains the structure at initialization.
Large class numbers in LLMs likely contribute to their Hessian structure.
Abstract
Empirical studies reported that the Hessian matrix of neural networks (NNs) exhibits a near-block-diagonal structure, yet its theoretical foundation remains unclear. In this work, we reveal that the reported Hessian structure comes from a mixture of two forces: a ``static force'' rooted in the architecture design, and a ''dynamic force'' arisen from training. We then provide a rigorous theoretical analysis of ''static force'' at random initialization. We study linear models and 1-hidden-layer networks for classification tasks with classes. By leveraging random matrix theory, we compare the limit distributions of the diagonal and off-diagonal Hessian blocks and find that the block-diagonal structure arises as becomes large. Our findings reveal that is one primary driver of the near-block-diagonal structure. These results may shed new light on the Hessian structure of large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
