The Condensate Theorem: Transformers are O(n), Not $O(n^2)$

Jorge L. Ruiz Williams

arXiv:2602.06317·cs.LG·February 11, 2026

The Condensate Theorem: Transformers are O(n), Not $O(n^2)$

Jorge L. Ruiz Williams

PDF

Open Access

TL;DR

This paper introduces the Condensate Theorem, showing that attention sparsity in transformers is a learned topological property, enabling lossless, linear-time attention computation with significant speedups and reduced inference costs.

Contribution

It demonstrates that attention can be computed losslessly in linear time by projecting onto a learned topological manifold, challenging the quadratic complexity assumption.

Findings

01

Attention mass concentrates on a topological manifold in trained models

02

Projection onto the Condensate Manifold achieves lossless $O(n)$ attention

03

Significant speedups in inference performance across multiple models

Abstract

We present the Condensate Theorem: attention sparsity is a learned topological property, not an architectural constraint. Through empirical analysis of trained language models, we find that attention mass concentrates on a distinct topological manifold -- and this manifold can be identified dynamically without checking every position. We prove a general result: for any query, projecting attention onto the Condensate Manifold (Anchor + Window + Dynamic Top-k) achieves 100% output equivalence with full $O (n^{2})$ attention. This is not an approximation -- it is lossless parity. We validate this across GPT-2, Pythia, Qwen2, TinyLlama, and Mistral, demonstrating bit-exact token matching on 1,500+ generated tokens. By mapping this topology to hardware, our Topological Attention kernel achieves a 159x measured speedup at 131K tokens (3.94ms vs 628ms) and a projected >1,200x speedup at 1M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks