Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Arya Honarpisheh, Mustafa Bozdag, Octavia Camps, Mario Sznaier

TL;DR
This paper provides a theoretical analysis of selective state-space models, revealing how spectral properties affect their stability and generalization, supported by empirical validation on various sequence tasks.
Contribution
It introduces a novel generalization bound for selective SSMs and links spectral abscissa to model stability and generalization capabilities.
Findings
Spectral abscissa influences model stability during training.
Theoretical bounds align with empirical results.
Selective SSMs can effectively generalize across sequence lengths.
Abstract
State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
