Towards Quantifying the Hessian Structure of Neural Networks

Zhaorui Dong; Yushun Zhang; Jianfeng Yao; Ruoyu Sun

arXiv:2505.02809·cs.LG·September 23, 2025

Towards Quantifying the Hessian Structure of Neural Networks

Zhaorui Dong, Yushun Zhang, Jianfeng Yao, Ruoyu Sun

PDF

Open Access 1 Repo

TL;DR

This paper investigates the origins of the near-block-diagonal structure of neural network Hessians, revealing that the number of classes significantly influences this structure, especially in large models like LLMs.

Contribution

It provides a theoretical analysis of the static component of Hessian structure at initialization, highlighting the role of class count in the block-diagonal pattern.

Findings

01

Hessian block-diagonal structure emerges as class number increases.

02

Theoretical analysis based on random matrix theory explains the structure at initialization.

03

Large class numbers in LLMs likely contribute to their Hessian structure.

Abstract

Empirical studies reported that the Hessian matrix of neural networks (NNs) exhibits a near-block-diagonal structure, yet its theoretical foundation remains unclear. In this work, we reveal that the reported Hessian structure comes from a mixture of two forces: a ``static force'' rooted in the architecture design, and a ''dynamic force'' arisen from training. We then provide a rigorous theoretical analysis of ''static force'' at random initialization. We study linear models and 1-hidden-layer networks for classification tasks with $C$ classes. By leveraging random matrix theory, we compare the limit distributions of the diagonal and off-diagonal Hessian blocks and find that the block-diagonal structure arises as $C$ becomes large. Our findings reveal that $C$ is one primary driver of the near-block-diagonal structure. These results may shed new light on the Hessian structure of large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zyushun/hessian-structure
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications