Local properties of neural networks through the lens of layer-wise Hessians
Maxim Bolshim (1), Alexander Kugaevskikh (1) ((1) ITMO University, Saint Petersburg, Russia)

TL;DR
This paper introduces a layer-wise Hessian analysis framework to understand neural network geometry, revealing spectral patterns linked to overfitting, underparameterization, and generalization across extensive experiments.
Contribution
It presents a novel methodology for analyzing neural networks through local Hessians, connecting geometric properties with network performance and training dynamics.
Findings
Spectral properties of local Hessians correlate with overfitting and underparameterization.
Consistent structural regularities in Hessian evolution during training.
Hessian spectra relate to generalization performance across datasets.
Abstract
We introduce a methodology for analyzing neural networks through the lens of layer-wise Hessian matrices. The local Hessian of each functional block (layer) is defined as the matrix of second derivatives of a scalar function with respect to the parameters of that layer. This concept provides a formal tool for characterizing the local geometry of the parameter space. We show that the spectral properties of local Hessians, such as the distribution of eigenvalues, reveal quantitative patterns associated with overfitting, underparameterization, and expressivity in neural network architectures. We conduct an extensive empirical study involving 111 experiments across 37 datasets. The results demonstrate consistent structural regularities in the evolution of local Hessians during training and highlight correlations between their spectra and generalization performance. These findings establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning in Materials Science
