The Effect of Depth on the Expressivity of Deep Linear State-Space Models
Zeyu Bao, Penghao Yu, Haotian Jiang, Qianxiao Li

TL;DR
This paper systematically analyzes how depth and width affect the expressiveness of deep linear state-space models, revealing conditions under which depth enhances capacity especially under parameter norm constraints.
Contribution
It provides a theoretical framework characterizing the influence of depth and width on expressiveness, including equivalence and differences under norm constraints, and bounds on minimal depth for representation.
Findings
Increasing depth and width are equivalent without parameter constraints.
Deep SSMs can represent large norm targets more efficiently under norm constraints.
Upper bounds on minimal depth needed for representation are derived.
Abstract
Deep state-space models (SSMs) have gained increasing popularity in sequence modelling. While there are numerous theoretical investigations of shallow SSMs, how the depth of the SSM affects its expressiveness remains a crucial problem. In this paper, we systematically investigate the role of depth and width in deep linear SSMs, aiming to characterize how they influence the expressive capacity of the architecture. First, we rigorously prove that in the absence of parameter constraints, increasing depth and increasing width are generally equivalent, provided that the parameter count remains within the same order of magnitude. However, under the assumption that the parameter norms are constrained, the effects of depth and width differ significantly. We show that a shallow linear SSM with large parameter norms can be represented by a deep linear SSM with smaller norms using a constructive…
Peer Reviews
Decision·Submitted to ICLR 2026
The authors provide a compelling argument in the introduction to motivate the analysis of depth for expressive capacity of deep linear SSMs; and provide good foundational background on related works. The authors provide a solid theoretical analysis (Appendix A)
Inconsistent related works paragraph headers. The theoretical framework could be better integrated into the overall narrative (and some of it moved into the Appendix), currently the theory and derived results are quite dense and could distract from the key results. Section 5 is very verbose, and can be tied together more concisely. Currently, the overwhelming amount of text hides the results. The empirical results can be expanded to cover more complex tasks and models.
The authors provide insights into the expressivity of linear SSMs, providing proofs that wide SSMs can be equivalently represented by deep SSMs. The fact that deep SSMs with smaller norm constraints can represent wide SSMs with larger norm constraints is insightful. Similar trends are shown to exist for nonlinear models. The reviewer believes this is an interesting contribution furthering the understanding of SSMs.
More effort could be spent improving the presentation and flow of the paper. In Section 3.1 for example, the hidden state of any given layer at time step 0, i.e., $h_l(0)$ is not initialized and the writing could be improved. The theoretical statements are sometimes vague and imprecise. Overall the paper is tough to follow from a theoretical point of view due to a lack of mathematical rigor. The constant $c_1>0$ is not used in the statement of Theorem 1. In Theorem 2 and Corollary 1, I assume th
1. Study of practical trade-offs within the design of deep SSMs is (imo) chronically lacking in the literature. This paper is a step towards that. 2. The paper is relatively well written, and is relatively easy to follow. 3. The experiments do support some of the claims made.
# Major Weaknesses My major block with this paper is what the purpose of the paper is and the level of significance it achieves: 1. In practice, deep SSMs use always non-linearities. Therefore, I will always struggle, on a fairly fundamental level, to be too enthusiastic about work that studies a theoretical construct. This is not a comment so much on the quality of the work, but the significance.* This may be acceptable for learning-theory specific conferences or workshops, but these types
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
