Attention (as Discrete-Time Markov) Chains

Yotam Erel; Olaf D\"unkel; Rishabh Dabral; Vladislav Golyanik; Christian Theobalt; Amit H. Bermano

arXiv:2507.17657·cs.CV·October 21, 2025

Attention (as Discrete-Time Markov) Chains

Yotam Erel, Olaf D\"unkel, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, Amit H. Bermano

PDF

Open Access 1 Video

TL;DR

This paper presents a novel interpretation of attention mechanisms in transformers as discrete-time Markov chains, enabling new insights, improved segmentation, and enhanced image generation through the analysis of metastable states and TokenRank.

Contribution

It introduces a Markov chain perspective on attention, allowing analysis of token importance and attention dynamics, leading to state-of-the-art zero-shot segmentation and improved image generation.

Findings

01

Metastable states correspond to semantically similar regions.

02

TokenRank improves image generation quality and diversity.

03

The framework enhances segmentation performance on benchmarks.

Abstract

We introduce a new interpretation of the attention matrix as a discrete-time Markov chain. Our interpretation sheds light on common operations involving attention scores such as selection, summation, and averaging in a unified framework. It further extends them by considering indirect attention, propagated through the Markov chain, as opposed to previous studies that only model immediate effects. Our key observation is that tokens linked to semantically similar regions form metastable states, i.e., regions where attention tends to concentrate, while noisy attention scores dissipate. Metastable states and their prevalence can be easily computed through simple matrix multiplication and eigenanalysis, respectively. Using these lightweight tools, we demonstrate state-of-the-art zero-shot segmentation. Lastly, we define TokenRank -- the steady state vector of the Markov chain, which measures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Attention (as Discrete-Time Markov) Chains· slideslive

Taxonomy

TopicsNeural Networks and Applications