Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian   Geometry Finds It

Marvin F. da Silva; Felix Dangel; Sageev Oore

arXiv:2505.05409·cs.LG·May 9, 2025

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

Marvin F. da Silva, Felix Dangel, Sageev Oore

PDF

Open Access 1 Video

TL;DR

This paper introduces a symmetry-aware sharpness measure for transformers using Riemannian geometry, which better correlates with their generalization performance than previous measures.

Contribution

It redefines sharpness on a quotient manifold to account for transformer symmetries, improving the understanding of model generalization.

Findings

01

Geodesic sharpness correlates strongly with generalization in transformers.

02

Higher-order approximations of geodesics improve sharpness's predictive power.

03

The method applies to both synthetic and real-world transformer models.

Abstract

The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We argue that existing sharpness measures fail for transformers, because they have much richer symmetries in their attention mechanism that induce directions in parameter space along which the network or its loss remain identical. We posit that sharpness must account fully for these symmetries, and thus we redefine it on a quotient manifold that results from quotienting out the transformer symmetries, thereby removing their ambiguities. Leveraging tools from Riemannian geometry, we propose a fully general notion of sharpness, in terms of a geodesic ball on the symmetry-corrected quotient manifold. In practice, we need to resort to approximating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face Recognition and Perception · Face recognition and analysis

MethodsSoftmax · Attention Is All You Need