Structured Recurrent Mixers for Massively Parallelized Sequence Generation

Benjamin L. Badger

arXiv:2605.08696·cs.CL·May 20, 2026

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

Benjamin L. Badger

PDF

TL;DR

The paper introduces Structured Recurrent Mixers, a novel architecture enabling dual sequence representations for efficient training and high-throughput inference, improving over existing linear complexity models.

Contribution

It presents a new architecture that converts between parallel and recurrent representations without specialized kernels, enhancing training efficiency and inference throughput.

Findings

01

Greater training efficiency and input capacity compared to other models.

02

12x throughput and 170x concurrency improvements over Transformers.

03

Effective reinforcement learning training with SRMs.

Abstract

Over the last two decades, language modeling has experienced a shift from the use of predominantly recurrent architectures that process tokens sequentially during training and inference to non-recurrent models that process sequence elements in parallel during training, which results in greater training efficiency and stability at the expense of lower inference throughput. Here we introduce the Structured Recurrent Mixer, an architecture that allows for algebraic conversion between a sequence parallel representation at train time and a recurrent representation at inference, notably without the need for specialized kernels or device-specific memory management. We show experimentally that this dual representation allows for greater training efficiency, higher input information capacity, and larger inference throughput and concurrency when compared to other linear complexity models. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.