ASAP: Attention Sink Anchored Pruning

Jaehyuk Lee; Hanyoung Kim; Yanggee Kim; Donghun Lee

arXiv:2605.22372·cs.LG·May 22, 2026

ASAP: Attention Sink Anchored Pruning

Jaehyuk Lee, Hanyoung Kim, Yanggee Kim, Donghun Lee

PDF

TL;DR

ASAP is a training-free pruning framework for Vision Transformers that identifies and compresses uninformative tokens by modeling information flow as a Lazy Random Walk, significantly improving efficiency without sacrificing accuracy.

Contribution

It introduces a novel sink-based token pruning method using diffusion distance and clustering, outperforming existing techniques across multiple vision tasks.

Findings

01

ASAP accelerates throughput by up to 48%.

02

It maintains or exceeds baseline accuracy.

03

It outperforms state-of-the-art token reduction methods.

Abstract

Vision Transformers (ViTs) face severe computational bottlenecks due to the quadratic complexity of self-attention at high resolutions. Existing token reduction methods rely on local metrics - such as single-layer attention scores - that are inherently vulnerable to the attention sink phenomenon, where uninformative tokens are paradoxically preserved over salient foreground objects. We propose ASAP (Attention Sink Anchored Pruning), a training-free framework that recasts this sink as a feature. Modeling ViT information flow as a Lazy Random Walk, ASAP identifies the sink as a dominant accumulator of probability mass. By computing the diffusion distance to the sink within the cumulative transition matrix, ASAP partitions tokens via Radial Diffusion Clustering and compresses background redundancy through Transition Weight Pooling in a single shot. Extensive experiments across image,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.