KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices
Wuyang Zhou, Yuxuan Gu, Giorgos Iacovides, Danilo Mandic

TL;DR
KromHC introduces a scalable, manifold-constrained hyper-connection method using Kronecker products to ensure exact doubly stochastic residual matrices, improving training stability and efficiency in neural networks.
Contribution
It proposes KromHC, a novel approach that employs Kronecker products of smaller matrices to efficiently parametrize residual matrices with exact stochasticity, reducing complexity and enhancing performance.
Findings
KromHC matches or outperforms state-of-the-art methods.
It significantly reduces parameter complexity to O(n^2 C).
Experiments validate improved stability and scalability.
Abstract
The success of Hyper-Connections (HC) in neural networks (NN) has also highlighted issues related to its training instability and restricted scalability. The Manifold-Constrained Hyper-Connections (mHC) mitigate these challenges by projecting the residual connection space onto a Birkhoff polytope, however, it faces two issues: 1) its iterative Sinkhorn-Knopp (SK) algorithm does not always yield exact doubly stochastic residual matrices; 2) mHC incurs a prohibitive parameter complexity with as the width of the residual stream and as the feature dimension. The recently proposed mHC-lite reparametrizes the residual matrix via the Birkhoff-von-Neumann theorem to guarantee double stochasticity, but also faces a factorial explosion in its parameter complexity, . To address both challenges, we propose \textbf{KromHC}, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning
