Sublinear Time Quantum Algorithm for Attention Approximation
Zhao Song, Jianfei Xue, Jiahao Zhang, Lichen Zhang

TL;DR
This paper introduces a quantum data structure that approximates rows of the attention matrix in transformers efficiently, reducing the computational complexity to sublinear time relative to the sequence length, which is a novel achievement.
Contribution
It presents the first quantum data structure for row-wise approximation of attention matrices with sublinear time complexity, leveraging quantum Nyström, mean estimation, and leverage score sampling.
Findings
Achieves sublinear time approximation of attention rows.
Preprocessing time depends on statistical dimension and stable rank.
Each row query answered in near-linear time in statistical dimension.
Abstract
Given the query, key and value matrices , the attention module is defined as where with applied entrywise, . The attention module is the backbone of modern transformers and large language models, but explicitly forming the softmax matrix incurs time, motivating numerous approximation schemes that reduce runtime to via sparsity or low-rank factorization. We propose a quantum data structure that approximates any row of using only row queries to . Our algorithm preprocesses these matrices in time, where is the target accuracy, is the…
Peer Reviews
Decision·ICLR 2026 Poster
S1: High Originality and Theoretical Significance: This work, to the best of my knowledge, is the first to propose a sublinear-time quantum algorithm for approximating the standard Transformer attention mechanism in the row-query setting. Achieving a preprocessing complexity of $\tilde{O}(n^{0.5})$, the method provides a potential quadratic speedup over classical algorithms. This represents a meaningful theoretical advance and offers a new perspective on overcoming the quadratic bottleneck in la
W1: Lack of Empirical Validation: The paper is entirely theoretical and does not provide any numerical simulation or small-scale experiment to illustrate the potential practical impact of the proposed method. While this is acceptable for a theoretical contribution, even a simple empirical demonstration (e.g., simulated quantum runtime scaling or synthetic kernel approximation) would help substantiate the claimed sublinear advantages. W2: Symmetrization Limitation: Because the algorithm approxim
The main strength of the paper is to present a sublinear time algorithm that answers row queries for attention approximation in the quantum model. The techniques are very interesting and conceptually simple.
Perhaps one minor weakness is that there are few previous works on attention approximation that achieve spectral norm approximation guarantees and it would be to prove such a guarantee here as well.
1. The work is the first to achieve sublinear-in-n row queries for attention approximation using quantum methods. 2. The approach makes no structural assumptions making it widely applicable.
1. Parameter dependence: The runtime depends on s and α , which may be large in practice, limiting practical speedups. 2. Norm of D−1 assumption: The guarantee requires ∥D−1∥<(ϵ∥E∥+λn)−1, which may not hold in all settings.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Quantum many-body systems · Quantum Information and Cryptography
