State Space Models are Provably Comparable to Transformers in Dynamic   Token Selection

Naoki Nishikawa; Taiji Suzuki

arXiv:2405.19036·stat.ML·March 6, 2025

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

Naoki Nishikawa, Taiji Suzuki

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that state space models combined with nonlinear layers are theoretically comparable to Transformers in token selection and function estimation, offering a computationally efficient alternative.

Contribution

It provides the first theoretical analysis showing SSMs with nonlinear layers match Transformers' capabilities in sequence modeling tasks.

Findings

01

SSMs with nonlinear layers can solve synthetic tasks challenging for single SSM layers.

02

SSMs are theoretically equivalent to Transformers in nonparametric regression.

03

SSMs offer a computationally efficient alternative to Transformers.

Abstract

Deep neural networks based on state space models (SSMs) are attracting significant attention in sequence modeling since their computational cost is much smaller than that of Transformers. While the capabilities of SSMs have been demonstrated through experiments in various tasks, theoretical understanding of SSMs is still limited. In particular, most theoretical studies discuss the capabilities of SSM layers without nonlinear layers, and there is a lack of discussion on their combination with nonlinear layers. In this paper, we explore the capabilities of SSMs combined with fully connected neural networks, and show that they are comparable to Transformers in extracting the essential tokens depending on the input. As concrete examples, we consider two synthetic tasks, which are challenging for a single SSM layer, and demonstrate that SSMs combined with nonlinear layers can efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection· slideslive

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems · Control Systems and Identification