On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages
Aleksandar Terzi\'c, Michael Hersche, Giacomo Camposampiero, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

TL;DR
This paper analyzes the expressiveness and length generalization of selective state-space models (SSMs) on regular languages, introduces a novel SD-SSM architecture with perfect length generalization, and evaluates its performance on automata tasks.
Contribution
The paper introduces the Selective Dense State-Space Model (SD-SSM), the first selective SSM with perfect length generalization on regular language tasks.
Findings
SD-SSM achieves perfect length generalization on various regular language tasks.
Variants of diagonal selective SSMs are empirically evaluated on automata.
Theoretical insights explain the experimental performance of selective SSMs.
Abstract
Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing
MethodsAttention Is All You Need · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Sparse Evolutionary Training · Residual Connection · Multi-Head Attention · Adam
