Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang, Zaiyi Zheng, Zhengzhang Chen, Jundong Li

TL;DR
This paper introduces rotation symmetry, a continuous parameter space symmetry for transformers, enabling more effective model fusion by expanding the equivalence set beyond permutation symmetry.
Contribution
The paper proposes rotation symmetry as a new form of parameter space symmetry for transformers and develops a theoretically optimal matching algorithm for model fusion.
Findings
Rotation symmetry significantly improves model fusion performance.
The matching algorithm is effective across NLP and vision tasks.
Code is publicly available for reproducibility.
Abstract
Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstro and Planetary Science · Cold Atom Physics and Bose-Einstein Condensates
MethodsSparse Evolutionary Training
