TL;DR
This paper investigates the layerwise Hessian eigenspectra of deep neural networks, revealing similarities across layers and proposing a regularizer based on the Hessian trace that improves generalization by encouraging flatter minima.
Contribution
It introduces a layerwise loss landscape analysis, shows the Hessian eigenspectrum similarity across layers, and proposes a novel Hessian trace regularizer for better generalization.
Findings
Layerwise Hessian eigenspectra are similar to the entire Hessian.
Middle layers' Hessian eigenspectrum closely resembles the overall spectrum.
Hessian trace decreases as training progresses, correlating with improved generalization.
Abstract
Loss landscape analysis is extremely useful for a deeper understanding of the generalization ability of deep neural network models. In this work, we propose a layerwise loss landscape analysis where the loss surface at every layer is studied independently and also on how each correlates to the overall loss surface. We study the layerwise loss landscape by studying the eigenspectra of the Hessian at each layer. In particular, our results show that the layerwise Hessian geometry is largely similar to the entire Hessian. We also report an interesting phenomenon where the Hessian eigenspectrum of middle layers of the deep neural network are observed to most similar to the overall Hessian eigenspectrum. We also show that the maximum eigenvalue and the trace of the Hessian (both full network and layerwise) reduce as training of the network progresses. We leverage on these observations to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
