Power-based Partial Attention: Bridging Linear-Complexity and Full Attention

Yufeng Huang

arXiv:2601.17334·cs.LG·January 28, 2026

Power-based Partial Attention: Bridging Linear-Complexity and Full Attention

Yufeng Huang

PDF

Open Access

TL;DR

This paper introduces power-based partial attention (PPA), a scalable attention mechanism that interpolates between linear and quadratic complexity, demonstrating that sub-quadratic attention can match full attention performance.

Contribution

The paper proposes PPA, a novel attention method of order O(L^{1+p}) that bridges linear and full attention, enabling analysis of attention complexity-performance trade-offs.

Findings

01

Sub-quadratic attention can achieve full attention performance.

02

Performance transitions sharply from linear to full attention over a narrow parameter range.

03

There exists an intermediate p where attention complexity is reduced without performance loss.

Abstract

It is widely accepted from transformer research that "attention is all we need", but the amount of attention required has never been systematically quantified. Is quadratic $O (L^{2})$ attention necessary, or is there a sub-quadratic attention mechanism that can achieve comparable performance? To answer this question, we introduce power-based partial attention (PPA), an attention mechanism of order $O (L^{1 + p})$ , where $0 \leq p \leq 1$ , such that $p = 0$ corresponds to sliding window attention with linear complexity, and $p = 1$ corresponds to full attention. With this attention construction, we can explore how transformer architecture performance varies as a function of the attention scaling behavior controlled by $p$ . The overall trend from our experiments shows an S-curve-like behavior where the performance transitions from sliding-window (linear-complexity) attention to full attention over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Low-power high-performance VLSI design · Parallel Computing and Optimization Techniques