Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking
Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh

TL;DR
This paper investigates the origins of geometric continuity in deep neural networks, revealing how residual connections and symmetry-breaking nonlinearities contribute to this property across layers.
Contribution
It identifies residual connections and symmetry-breaking nonlinearities as key mechanisms behind geometric continuity in deep networks, supported by experiments on toy models and transformers.
Findings
Residual connections create cross-layer gradient coherence.
Symmetry-breaking nonlinearities constrain layers to a shared coordinate frame.
Activation and normalization have distinct roles in continuity distribution.
Abstract
Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small transformers, we identify two mechanisms: residual connections create cross-layer gradient coherence that aligns weight updates across layers, and symmetry-breaking nonlinearities constrain all layers to a shared coordinate frame, preventing the rotation drift that would otherwise destabilize weight structure. Crucially, a nonlinear but rotation-preserving activation fails to retain continuity, isolating symmetry breaking -- not nonlinearity itself -- as the active ingredient. Activation and normalization play distinct roles: activation concentrates continuity in the leading singular direction, while normalization distributes it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
