Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez, Luca Zancato, Benjamin Bowman, Aditya Golatkar, Wei Xia, Stefano Soatto

TL;DR
This paper introduces a novel method called Span-Expanded Attention (SE-Attn) that enhances hybrid state space models by expanding their memory span through relevancy-based token retrieval, enabling efficient long-sequence processing.
Contribution
The paper proposes a new mechanism for hybrid models to access distant past tokens without extra hardware, and a fine-tuning method HyLoRA for adapting models to longer sequences.
Findings
SE-Attn extends memory span up to 8 times longer.
HyLoRA with SE-Attn outperforms LongLoRA on long-range NLP tasks.
The approach is more cost-effective and maintains high performance.
Abstract
The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike current methods of combining SSM and Attention layers, we allow the state to be allocated based on relevancy rather than recency. In this way, for every new set of query tokens, our models can "eidetically" access tokens from beyond the Attention span of current Hybrid SSMs without requiring extra hardware resources. We introduce a method to expand the memory span of the hybrid state by "reserving" a fraction of the Attention context for tokens retrieved from arbitrarily distant in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications · Algorithms and Data Compression
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
