Single-Nodal Spontaneous Symmetry Breaking in NLP Models
Shalom Rosner, Ronit D. Gross, Ella Koresh, Ido Kanter

TL;DR
This paper reveals that NLP models like BERT can exhibit spontaneous symmetry breaking at the single-node level during training, affecting their learning capacity and cooperation among nodes, with implications for understanding model behavior.
Contribution
It demonstrates the occurrence of spontaneous symmetry breaking in NLP models at the individual node level during training, a phenomenon previously known only in physical systems.
Findings
Symmetry breaking occurs at the level of individual attention heads.
Nodes acquire specific token or label learning capabilities post-training.
A crossover in learning ability emerges as the number of nodes increases.
Abstract
Spontaneous symmetry breaking in statistical mechanics primarily occurs during phase transitions at the thermodynamic limit where the Hamiltonian preserves inversion symmetry, yet the low-temperature free energy exhibits reduced symmetry. Herein, we demonstrate the emergence of spontaneous symmetry breaking in natural language processing (NLP) models during both pre-training and fine-tuning, even under deterministic dynamics and within a finite training architecture. This phenomenon occurs at the level of individual attention heads and is scaled-down to its small subset of nodes and also valid at a single-nodal level, where nodes acquire the capacity to learn a limited set of tokens after pre-training or labels after fine-tuning for a specific classification task. As the number of nodes increases, a crossover in learning ability occurs, governed by the tradeoff between a decrease…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum many-body systems · Machine Learning in Materials Science · Model Reduction and Neural Networks
