A simple connection from loss flatness to compressed neural representations

Shirui Chen; Stefano Recanatesi; Eric Shea-Brown

arXiv:2310.01770·cs.LG·February 24, 2026

A simple connection from loss flatness to compressed neural representations

Shirui Chen, Stefano Recanatesi, Eric Shea-Brown

PDF

Open Access

TL;DR

This paper establishes a theoretical and empirical link between loss sharpness and neural representation compression, showing that flatter minima lead to more compressed and stable neural representations across various architectures.

Contribution

It introduces new measures and bounds connecting sharpness to representation compression, providing a unified understanding of their relationship.

Findings

01

Flatter minima constrain neural representation compression.

02

Positive correlation between sharpness and compression observed across architectures.

03

New bounds are more stable and reparametrization-invariant.

Abstract

Despite extensive study, the significance of sharpness -- the trace of the loss Hessian at local minima -- remains unclear. We investigate an alternative perspective: how sharpness relates to the geometric structure of neural representations, specifically representation compression, defined as how strongly neural activations concentrate under local input perturbations. We introduce three measures -- Local Volumetric Ratio (LVR), Maximum Local Sensitivity (MLS), and Local Dimensionality -- and derive upper bounds showing these are mathematically constrained by sharpness: flatter minima necessarily limit compression. We extend these bounds to reparametrization-invariant sharpness and introduce network-wide variants (NMLS, NVR) that provide tighter, more stable bounds than prior single-layer analyses. Empirically, we validate consistent positive correlations across feedforward,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent