Sparsified State-Space Models are Efficient Highway Networks
Woomin Song, Jihoon Tack, Sangwoo Mo, Seunghyuk Oh, Jinwoo Shin

TL;DR
This paper introduces Simba, a hierarchical sparsification method for state-space models that prunes tokens to create highway-like layers, improving efficiency and information flow in sequence modeling tasks.
Contribution
The paper proposes a novel token pruning criterion and hierarchical sparsification technique for SSMs, enhancing their efficiency and long-range information flow.
Findings
Simba outperforms baseline models with the same FLOPS in NLP tasks.
Sparsification creates highway-like layers that improve long sequence processing.
The method enhances both efficiency and information flow in SSMs.
Abstract
State-space models (SSMs) offer a promising architecture for sequence modeling, providing an alternative to Transformers by replacing expensive self-attention with linear recurrences. In this paper, we propose a simple yet effective trick to enhance SSMs within given computational budgets by sparsifying them. Our intuition is that tokens in SSMs are highly redundant due to gradual recurrent updates, and dense recurrence operations block the delivery of past information. In particular, we observe that upper layers of SSMs tend to be more redundant as they encode global information, while lower layers encode local information. Motivated by this, we introduce Simba, a hierarchical sparsification method for SSMs based on token pruning. Simba sparsifies upper layers more than lower layers, encouraging the upper layers to behave like highways. To achieve this, we propose a novel token pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Advanced Database Systems and Queries · Formal Methods in Verification
MethodsPruning · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
