mHC-SSM: Manifold-Constrained Hyper-Connections for State Space Language Models with Stream-Specialized Adapters
Abdulvahap Mutlu, \c{S}eng\"ul Do\u{g}an, T\"urker Tuncer

TL;DR
This paper introduces a manifold-constrained multi-stream residual mixing technique for state space models in language modeling, demonstrating measurable improvements in validation loss and perplexity on WikiText-2.
Contribution
It proposes a novel constrained residual mixing mechanism and stream-specific adapters for SSMs, improving language model performance.
Findings
Static mHC reduces validation loss from 6.3507 to 6.2448.
mHC with adapters further reduces validation loss to 6.1353.
Perplexity decreases from 572.91 to 461.88 with the proposed methods.
Abstract
Manifold-Constrained Hyper-Connections (mHC) introduce a stability-motivated variant of multi stream residual mixing by constraining residual stream mixing matrices to the manifold of doubly stochastic matrices via Sinkhorn-Knopp projection. In his work, we study whether mHC-style constrained multi-stream residual topology transfers effectively to state space model (SSM) language modeling. We implement a static mHC mechanism around an SSM block by expanding the residual stream into multiple parallel streams, aggregating streams into a single SSM input through simplex-constrained pre-mixing, scattering the SSM output back to streams through simplex-constrained post-mixing, and applying Sinkhorn-projected residual stream mixing at each layer. We further introduce stream-specialized adapters that add lightweight stream-specific capacity through a shared bottleneck with per-stream scaling,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
