go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices
Torque Dandachi, Sophia Diggs-Galligan

TL;DR
The paper introduces go-mHC, an efficient and exact parameterization of doubly stochastic matrices for dynamic layer connectivity, improving expressivity and scalability in neural networks.
Contribution
It proposes a novel generalized orthostochastic matrix parameterization that scales as O(d^3), interpolates between efficiency and expressivity, and enhances manifold-constrained hyper-connections.
Findings
go-mHC achieves minimal theoretical loss on synthetic tasks.
It converges up to 10 times faster than baselines.
Validated in a 30M parameter GPT-style model.
Abstract
Doubly stochastic matrices enable learned mixing across residual streams, but parameterizing the set of doubly stochastic matrices (the Birkhoff polytope) exactly and efficiently remains an open challenge. Existing exact methods scale factorially with the number of streams (), while Kronecker-factorized approaches are efficient but expressivity-limited. We introduce a novel exact parameterization grounded in the theory of generalized orthostochastic matrices, which scales as and exposes a single hyperparameter which continuously interpolates between a computationally efficient boundary and the fully expressive Birkhoff polytope. Building on Manifold-Constrained Hyper-Connections (HC), a framework for learned dynamic layer connectivity, we instantiate this parameterization in go-HC. Our method composes naturally with Kronecker-factorized methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
