TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes
Anton Lyubinin

TL;DR
This paper introduces TBP-mHC, a novel method for constructing fully expressive, manifold-constrained hyper connections using transportation polytopes, enhancing stability and scalability in residual networks.
Contribution
It proposes Transportation Birkhoff Polytope (TBP) parameterizations that generate doubly stochastic matrices exactly, avoiding iterative normalization and factorial complexity, while maintaining full expressivity.
Findings
Empirical results show competitive performance in language model pre-training.
The method improves training stability and scalability.
Constructs exactly doubly stochastic matrices with fewer degrees of freedom.
Abstract
Hyper-Connections (HC) improve residual networks by introducing learnable mixing across multiple residual streams, but unconstrained mixing leads to training instability. Manifold-Constrained Hyper-Connections (mHC) address this by enforcing approximate double stochasticity via Sinkhorn normalization, while mHC-lite ensures exact constraints through convex combinations of permutation matrices at the cost of factorial complexity. KromHC reduces this cost using Kronecker-product parameterizations, but restricts the mixing matrices to a structured submanifold of the Birkhoff polytope . We propose Transportation Birkhoff Polytope (TBP) parameterizations and their Recursive variants (RTBP), which construct exactly doubly stochastic mixing matrices with degrees of freedom. Our approach avoids iterative normalization and combinatorial explosion while preserving full expressivity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
