Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints
D\'aniel R\'acz, Mih\'aly Petreczky, B\'alint Dar\'oczy

TL;DR
This paper establishes length-independent generalization bounds for deep state-space model architectures with stable SSM blocks, providing theoretical support for their stability-based design choices.
Contribution
It introduces a PAC bound for long-range sequence models that depends on stability, not sequence length, justifying the use of stable SSM blocks.
Findings
PAC bound independent of sequence length
Bound improves with increased SSM stability
Supports stability as a beneficial design principle
Abstract
Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with \emph{stable} SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Machine Learning and Algorithms · Machine Fault Diagnosis Techniques
