A simple connection from loss flatness to compressed neural representations
Shirui Chen, Stefano Recanatesi, Eric Shea-Brown

TL;DR
This paper establishes a theoretical and empirical link between loss sharpness and neural representation compression, showing that flatter minima lead to more compressed and stable neural representations across various architectures.
Contribution
It introduces new measures and bounds connecting sharpness to representation compression, providing a unified understanding of their relationship.
Findings
Flatter minima constrain neural representation compression.
Positive correlation between sharpness and compression observed across architectures.
New bounds are more stable and reparametrization-invariant.
Abstract
Despite extensive study, the significance of sharpness -- the trace of the loss Hessian at local minima -- remains unclear. We investigate an alternative perspective: how sharpness relates to the geometric structure of neural representations, specifically representation compression, defined as how strongly neural activations concentrate under local input perturbations. We introduce three measures -- Local Volumetric Ratio (LVR), Maximum Local Sensitivity (MLS), and Local Dimensionality -- and derive upper bounds showing these are mathematically constrained by sharpness: flatter minima necessarily limit compression. We extend these bounds to reparametrization-invariant sharpness and introduce network-wide variants (NMLS, NVR) that provide tighter, more stable bounds than prior single-layer analyses. Empirically, we validate consistent positive correlations across feedforward,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
