Krause Synchronization Transformers

Jingkun Liu; Yisong Yue; Max Welling; Yue Song

arXiv:2602.11534·cs.LG·May 15, 2026

Krause Synchronization Transformers

Jingkun Liu, Yisong Yue, Max Welling, Yue Song

PDF

TL;DR

Krause Attention introduces a localized, sparse attention mechanism inspired by consensus dynamics, improving efficiency and performance across vision and language models by mitigating global synchronization issues.

Contribution

The paper proposes Krause Attention, a novel attention mechanism that promotes local synchronization, reduces complexity, and enhances model performance across diverse tasks.

Findings

01

Krause Attention reduces runtime complexity from quadratic to linear.

02

It achieves consistent performance gains in vision and language models.

03

It alleviates attention sink phenomena and representation collapse.

Abstract

Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor convergence toward a dominant mode, a behavior associated with representation collapse and attention sink phenomena. We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.