# Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention

**Authors:** Zhongpan Tang

arXiv: 2508.20407 · 2025-08-29

## TL;DR

TLinFormer introduces a novel linear attention architecture that achieves exact, full context-aware attention with linear complexity, significantly improving efficiency and performance on long-sequence tasks compared to existing methods.

## Contribution

The paper presents TLinFormer, a new linear attention model that maintains exact attention scores and full context awareness while reducing complexity to linear, addressing limitations of previous linear attention approaches.

## Key findings

- TLinFormer outperforms standard Transformers in inference latency.
- It demonstrates improved KV cache efficiency and reduced memory footprint.
- The model achieves significant overall speedup on long-sequence inference tasks.

## Abstract

The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its application in long-sequence tasks. To address this challenge, existing linear attention methods typically sacrifice model performance by relying on data-agnostic kernel approximations or restrictive context selection. This paper returns to the first principles of connectionism, starting from the topological structure of information flow, to introduce a novel linear attention architecture-\textbf{TLinFormer}. By reconfiguring neuron connection patterns, TLinFormer achieves strict linear complexity while computing exact attention scores and ensuring information flow remains aware of the full historical context. This design aims to bridge the performance gap prevalent between existing efficient attention methods and standard attention. Through a series of experiments, we systematically evaluate the performance of TLinFormer against a standard Transformer baseline on long-sequence inference tasks. The results demonstrate that TLinFormer exhibits overwhelming advantages in key metrics such as \textbf{inference latency}, \textbf{KV cache efficiency}, \textbf{memory footprint}, and \textbf{overall speedup}.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20407/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20407/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/2508.20407/full.md

---
Source: https://tomesphere.com/paper/2508.20407