Superlinear Multi-Step Attention

Yufeng Huang

arXiv:2601.18401·cs.LG·January 27, 2026

Superlinear Multi-Step Attention

Yufeng Huang

PDF

Open Access 1 Models

TL;DR

This paper introduces Superlinear attention, a novel multi-step attention architecture that reduces complexity for long sequences while maintaining access to all tokens, enabling efficient processing of very long contexts.

Contribution

It presents a fully trainable multi-step attention mechanism with subquadratic complexity, combining span search and attention, and demonstrates its feasibility and initial effectiveness on long-context tasks.

Findings

01

Achieves $O(L^{1+1/N})$ complexity with multi-step search

02

Demonstrates strong performance on long-context tasks up to 256K tokens

03

Attains high decoding throughput on large models at long sequence lengths

Abstract

In this paper, we propose \textbf{Superlinear attention}, a fully trainable multi-step attention architecture that achieves subquadratic complexity for long sequences while preserving \textbf{random context access} (a.k.a.\ structural non-exclusion): no eligible token position is structurally excluded from being selected for attention. Superlinear attention reformulates standard causal self-attention as a multi-step search problem with $N$ steps, yielding an overall complexity of $O (L^{1 + \frac{1}{N}})$ . To illustrate the architecture, we present a baseline $N = 2$ implementation, which is algorithmically analogous to standard jump search. In this $O (L^{3/2})$ instantiation, the first step performs $O (L^{3/2})$ span-search to select relevant spans of the sequence, and the second step applies $O (L^{3/2})$ span-attention (standard attention restricted to the selected spans). In an upscaled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
concavity-ai/superlinear-exp-v0.1
model· 14 dl· ♡ 23
14 dl♡ 23

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Stochastic Gradient Optimization Techniques