# From TLinFormer to TConstFormer: The Leap to Constant-Time Transformer Attention: Achieving O(1) Computation and O(1) KV Cache during Autoregressive Inference

**Authors:** Zhongpan Tang

arXiv: 2509.00202 · 2025-09-03

## TL;DR

TConstFormer introduces a novel transformer architecture with constant-time attention, drastically reducing memory and computational costs during autoregressive inference for ultra-long sequences, enabling more efficient long-text processing.

## Contribution

It presents the TConstFormer model with a periodic state update mechanism that achieves O(1) KV cache size and computational complexity, a significant advancement over traditional transformers.

## Key findings

- Outperforms baseline models in speed and memory efficiency
- Maintains high performance on long-text inference tasks
- Achieves constant-time attention with theoretical and experimental validation

## Abstract

Although the Transformer has become the cornerstone of modern AI, its autoregressive inference suffers from a linearly growing KV Cache and a computational complexity of O(N^2 d), severely hindering its ability to process ultra-long sequences. To overcome this limitation, this paper introduces the TConstFormer architecture, building upon our previous work, TLinFormer. TConstFormer employs an innovative periodic state update mechanism to achieve a truly constant-size O(1) KV Cache. The computational complexity of this mechanism is also O(1) in an amortized sense: it performs purely constant-time computations for $k-1$ consecutive steps (e.g., $k=256$) and executes a single linear-time global information synchronization only on the $k$-th step. Theoretical calculations and experimental results demonstrate that TConstFormer exhibits an overwhelming advantage over baseline models in terms of speed, memory efficiency, and overall performance on long-text inference tasks. This breakthrough paves the way for efficient and robust streaming language model applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00202/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00202/full.md

## References

5 references — full list in the complete paper: https://tomesphere.com/paper/2509.00202/full.md

---
Source: https://tomesphere.com/paper/2509.00202