Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Sukjun Hwang, Aakash Lahoti, Tri Dao, Albert Gu

TL;DR
Hydra introduces a bidirectional state space model based on matrix mixers that unifies and extends sequence modeling techniques, achieving superior performance on NLP and vision benchmarks by leveraging structured matrix parameterizations.
Contribution
The paper presents Hydra, a novel bidirectional state space model using quasiseparable matrix mixers, unifying various sequence models and demonstrating improved efficiency and accuracy.
Findings
Hydra outperforms BERT by 0.8 points on GLUE.
Hydra achieves 2% higher Top-1 accuracy on ImageNet.
The matrix mixer framework explains the success of Transformers and SSMs.
Abstract
A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers. This paper studies a unifying matrix mixer view of sequence mixers that can be conceptualized as a linear map on the input sequence. This framework encompasses a broad range of well-known sequence models, including the self-attention of Transformers as well as recent strong alternatives such as structured state space models (SSMs), and allows understanding downstream characteristics such as efficiency and expressivity through properties of their structured matrix class. We identify a key axis of matrix parameterizations termed sequence alignment, which increases the flexibility and performance of matrix mixers, providing insights into the strong performance of Transformers and recent SSMs such as Mamba. Furthermore, the matrix mixer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Hydra · Layer Normalization · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Adam · Dropout
