Remove Symmetries to Control Model Expressivity and Improve Optimization
Liu Ziyin, Yizhou Xu, Isaac Chuang

TL;DR
This paper demonstrates how symmetries in loss functions can cause neural networks to become trapped in low-capacity states, and introduces a simple, model-agnostic algorithm called syre to remove these symmetries, thereby improving training and performance.
Contribution
The paper provides a theoretical analysis of symmetry-induced capacity reduction and proposes a novel, model-agnostic algorithm to mitigate this issue in neural networks.
Findings
Removing symmetries improves optimization and performance.
The syre algorithm effectively eliminates symmetry-induced traps.
The method is theoretically justified and broadly applicable.
Abstract
When symmetry is present in the loss function, the model is likely to be trapped in a low-capacity state that is sometimes known as a "collapse". Being trapped in these low-capacity states can be a major obstacle to training across many scenarios where deep learning technology is applied. We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. A remarkable merit of the proposed method is that it is model-agnostic and does not require any knowledge of the symmetry.
Peer Reviews
Decision·ICLR 2025 Poster
1. The proposed method is theoretically sound. Authors have rigorously proved many useful results like Proposition 3 which shows symmetry reduces parameter dimension. 2. Their proposed solution is generalizable across different architectures and does not require detailed knowledge of the model’s symmetries. 3. The empirical evaluations demonstrate significant improvements in model performance and capacity, particularly in challenging scenarios like continual learning and self-supervised learning
1. Some of the theoretical explanations are dense and might be challenging for readers not familiar with the detailed aspects of symmetry in neural networks. Having decent knowledge of symmetries in neural networks I still struggled a bit with some sections like Section 5. Summarizing early on what the main theoretical result of each section might make it easier for reader to appreciate the contributions of the paper better. 2. The limitations of the method are not thoroughly discussed, which
- As noted, provide some plausible and interesting explanations for some mechanisms for how symmetry may reduce capacity that nontrivially build on (Zivin 2024). - In some settings and experiments, their algorithm (syre) seems to usefully outperform baselines/another symmetry removal method?
Experimental shortcomings - The authors make strong claims around the ability of symmetry-removal to improve performance for a) SSL, b) a supervised continual learning problem, and c) an RL continual learning problem. They do not implement proper baseline comparisons for any of these settings nor go beyond toy experiments. If they want to claim that (some degree of) symmetry removal is helpful for training models in practical settings, they need to either show that a) symmetry removal is reasona
Here are the main strengths of the paper: 1. Addresses an important practical problem of symmetry-induced model collapse that affects multiple domains in deep learning (VAEs, continual learning, self-supervised learning). 2. Proposes a remarkably simple solution that requires minimal code changes and no architectural modifications, making it easily adoptable in practice. 3. Shows promising results in continual learning scenarios, demonstrating improved plasticity and performance maintenance o
1. **Conceptual confusion about symmetries**: - The paper discusses "removing symmetries" but primarily analyzes fixed points where $P\theta_0 = 0$ - This misses the key aspect of symmetries - their group orbits and the associated degeneracies - In the simple case of a two-layer linear network $F(x) = UVx$, the important symmetry structure lies in the $GL(h)$ orbits $(U,V) \to (Ug, g^{-1}V)$, not in fixed points 2. **Overly restrictive theoretical framework**: - The condition $P\th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
