JPmHC Dynamical Isometry via Orthogonal Hyper-Connections
Biswa Sengupta, Jinhua Wang, Leo Brunswic

TL;DR
JPmHC introduces a spectrum-preserving hyper-connection framework that improves deep network stability and efficiency by controlling gradient spectra through manifold-constrained mixers, leading to better training dynamics and scalability.
Contribution
It proposes a novel Jacobian-spectrum preserving hyper-connection method with manifold constraints, providing theoretical analysis, memory-efficient implementation, and empirical validation for improved deep learning stability.
Findings
Faster convergence and higher accuracy on ARC-AGI
Lower computational cost compared to baselines
Spectral theory predictions align with empirical results
Abstract
Recent advances in deep learning, exemplified by Hyper-Connections (HC), have expanded the residual connection paradigm by introducing wider residual streams and diverse connectivity patterns. While these innovations yield significant performance gains, they compromise the identity mapping property of residual connections, leading to training instability, limited scalability, and increased memory overhead. To address these challenges, we propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a framework that replaces identity skips with a trainable linear mixer acting on n parallel streams while explicitly controlling gradient conditioning. By constraining the mixer M on operator-norm-bounded manifolds (e.g., bistochastic, Stiefel, Grassmann), JPmHC prevents gradient pathologies and enhances stability. JPmHC introduces three key contributions: (i) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topological and Geometric Data Analysis · Model Reduction and Neural Networks
