Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints

D\'aniel R\'acz; Mih\'aly Petreczky; B\'alint Dar\'oczy

arXiv:2405.20278·cs.LG·May 27, 2025

Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints

D\'aniel R\'acz, Mih\'aly Petreczky, B\'alint Dar\'oczy

PDF

Open Access

TL;DR

This paper establishes length-independent generalization bounds for deep state-space model architectures with stable SSM blocks, providing theoretical support for their stability-based design choices.

Contribution

It introduces a PAC bound for long-range sequence models that depends on stability, not sequence length, justifying the use of stable SSM blocks.

Findings

01

PAC bound independent of sequence length

02

Bound improves with increased SSM stability

03

Supports stability as a beneficial design principle

Abstract

Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with \emph{stable} SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Machine Learning and Algorithms · Machine Fault Diagnosis Techniques