Sequential-Parallel Duality in Prefix Scannable Models
Morris Yau, Sharut Gupta, Valerie Engelmayer, Kazuki Irie, Stefanie Jegelka, Jacob Andreas

TL;DR
This paper introduces Prefix-Scannable Models (PSMs), a broad class of neural sequence models that unify existing architectures and achieve near-constant-time parallel evaluation with linear-time sequential inference.
Contribution
The paper characterizes PSMs as a general framework for sequence models supporting efficient inference, unifying and extending prior models like GLA and Mamba.
Findings
PSMs match the inference efficiency of state space models.
PSMs retain expressivity comparable to transformers.
Some PSMs exhibit better length generalization.
Abstract
Modern neural sequence models are designed to meet the dual mandate of parallelizable training and fast sequential inference. Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba, that achieve such ``sequential-parallel duality.'' This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference? We begin by describing a broad class of such models -- state space models -- as those whose state updates can be computed using the classic parallel prefix scan algorithm with a custom associative aggregation operator. We then define a more general class, Prefix-Scannable Models (PSMs), by relaxing the state aggregation operator to allow arbitrary (potentially non-associative) functions such as softmax attention.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Softmax · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
