The Geometry of Projection Heads: Conditioning, Invariance, and Collapse
Faris Chaudhry

TL;DR
This paper presents a geometric theory of projection heads in self-supervised learning, modeling them as trainable Riemannian metrics that influence invariance, collapse, and information trade-offs.
Contribution
It introduces a geometric framework for understanding projection heads, revealing their role as universal geometric buffers that decouple the backbone from pretraining constraints.
Findings
Linear heads perform implicit subspace whitening.
Smooth nonlinear heads induce negative Hessian eigenvalues at collapse.
Gradient flow dynamics and BatchNorm influence head stability and invariance.
Abstract
We develop a geometric theory of projection heads in self-supervised learning by modeling the head as a trainable Riemannian metric on the backbone representation manifold. We show that linear heads perform implicit subspace whitening, while nonlinear heads adapt local metrics to satisfy the specific topological constraints of the loss, with head depth empirically dictating this capacity. Analyzing dimensional collapse, we prove that smooth nonlinear heads natively induce negative eigenvalues in the Hessian at collapsed equilibria, making them unstable. We empirically validate this by continuously tracking the optimization geometry during training, which reveals that smooth activations like Swish can generate explicit negative curvature to escape collapse, whereas linear and ReLU heads under continuous-time gradient flow cannot, relying instead on discrete-time optimization dynamics and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
