Generalized Linear Mode Connectivity for Transformers
Alexander Theus, Alessandro Cabodi, Sotiris Anagnostidis, Antonio Orvieto, Sidak Pal Singh, and Valentina Boeva

TL;DR
This paper introduces a unified symmetry framework for analyzing the loss landscapes of Transformers, revealing low-loss paths between independently trained models and across different architectures, thus deepening understanding of model space geometry.
Contribution
It broadens the scope of symmetry considerations in neural networks to include multiple classes, enabling discovery of linear paths between models and across architectures, which was not possible before.
Findings
Low- and zero-barrier linear paths between trained Transformers.
Framework extends to multi-model and heterogeneous architectures.
Reveals deeper structure in the loss landscape.
Abstract
Understanding the geometry of neural network loss landscapes is a central question in deep learning, with implications for generalization and optimization. A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths despite appearing to lie in separate loss basins. However, this is often obscured by symmetries in parameter space -- such as neuron permutations -- which make functionally equivalent models appear dissimilar. Prior work has predominantly focused on neuron reordering through permutations, but such approaches are limited in scope and fail to capture the richer symmetries exhibited by modern architectures such as Transformers. In this work, we introduce a unified framework that captures four symmetry classes -- permutations, semi-permutations, orthogonal transformations, and general invertible maps --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis
