Sublinear Time Quantum Algorithm for Attention Approximation

Zhao Song; Jianfei Xue; Jiahao Zhang; Lichen Zhang

arXiv:2602.00874·quant-ph·February 3, 2026

Sublinear Time Quantum Algorithm for Attention Approximation

Zhao Song, Jianfei Xue, Jiahao Zhang, Lichen Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a quantum data structure that approximates rows of the attention matrix in transformers efficiently, reducing the computational complexity to sublinear time relative to the sequence length, which is a novel achievement.

Contribution

It presents the first quantum data structure for row-wise approximation of attention matrices with sublinear time complexity, leveraging quantum Nyström, mean estimation, and leverage score sampling.

Findings

01

Achieves sublinear time approximation of attention rows.

02

Preprocessing time depends on statistical dimension and stable rank.

03

Each row query answered in near-linear time in statistical dimension.

Abstract

Given the query, key and value matrices $Q, K, V \in R^{n \times d}$ , the attention module is defined as $Att (Q, K, V) = D^{- 1} A V$ where $A = exp (Q K^{⊤} / d)$ with $exp (\cdot)$ applied entrywise, $D = diag (A 1_{n})$ . The attention module is the backbone of modern transformers and large language models, but explicitly forming the softmax matrix $D^{- 1} A$ incurs $Ω (n^{2})$ time, motivating numerous approximation schemes that reduce runtime to $O (n d)$ via sparsity or low-rank factorization. We propose a quantum data structure that approximates any row of $Att (Q, K, V)$ using only row queries to $Q, K, V$ . Our algorithm preprocesses these matrices in $O (ϵ^{- 1} n^{0.5} (s_{λ}^{2.5} + s_{λ}^{1.5} d + α^{0.5} d))$ time, where $ϵ$ is the target accuracy, $s_{λ}$ is the…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

S1: High Originality and Theoretical Significance: This work, to the best of my knowledge, is the first to propose a sublinear-time quantum algorithm for approximating the standard Transformer attention mechanism in the row-query setting. Achieving a preprocessing complexity of $\tilde{O}(n^{0.5})$, the method provides a potential quadratic speedup over classical algorithms. This represents a meaningful theoretical advance and offers a new perspective on overcoming the quadratic bottleneck in la

Weaknesses

W1: Lack of Empirical Validation: The paper is entirely theoretical and does not provide any numerical simulation or small-scale experiment to illustrate the potential practical impact of the proposed method. While this is acceptable for a theoretical contribution, even a simple empirical demonstration (e.g., simulated quantum runtime scaling or synthetic kernel approximation) would help substantiate the claimed sublinear advantages. W2: Symmetrization Limitation: Because the algorithm approxim

Reviewer 02Rating 8Confidence 2

Strengths

The main strength of the paper is to present a sublinear time algorithm that answers row queries for attention approximation in the quantum model. The techniques are very interesting and conceptually simple.

Weaknesses

Perhaps one minor weakness is that there are few previous works on attention approximation that achieve spectral norm approximation guarantees and it would be to prove such a guarantee here as well.

Reviewer 03Rating 6Confidence 3

Strengths

1. The work is the first to achieve sublinear-in-n row queries for attention approximation using quantum methods. 2. The approach makes no structural assumptions making it widely applicable.

Weaknesses

1. Parameter dependence: The runtime depends on s and α , which may be large in practice, limiting practical speedups. 2. Norm of D−1 assumption: The guarantee requires ∥D−1∥<(ϵ∥E∥+λn)−1, which may not hold in all settings.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Quantum many-body systems · Quantum Information and Cryptography