SPOT: Sparsification with Attention Dynamics via Token Relevance in Vision Transformers

Oded Schlesinger; Amirhossein Farzam; J. Matias Di Martino; Guillermo Sapiro

arXiv:2511.10488·cs.CV·November 14, 2025

SPOT: Sparsification with Attention Dynamics via Token Relevance in Vision Transformers

Oded Schlesinger, Amirhossein Farzam, J. Matias Di Martino, Guillermo Sapiro

PDF

Open Access

TL;DR

SPOT introduces a method to identify and remove redundant tokens in Vision Transformers using attention dynamics, significantly reducing computation by up to 40% without losing accuracy.

Contribution

It proposes a lightweight, adaptable framework for early token relevance detection in ViTs, enhancing efficiency and interpretability.

Findings

01

Achieves up to 40% computational efficiency gains.

02

Maintains or improves accuracy with token sparsification.

03

Supports various ViT architectures with plug-in predictors.

Abstract

While Vision Transformers (ViT) have demonstrated remarkable performance across diverse tasks, their computational demands are substantial, scaling quadratically with the number of processed tokens. Compact attention representations, reflecting token interaction distributions, can guide early detection and reduction of less salient tokens prior to attention computation. Motivated by this, we present SParsification with attentiOn dynamics via Token relevance (SPOT), a framework for early detection of redundant tokens within ViTs that leverages token embeddings, interactions, and attention dynamics across layers to infer token importance, resulting in a more context-aware and interpretable relevance detection process. SPOT informs token sparsification and facilitates the elimination of such tokens, improving computational efficiency without sacrificing performance. SPOT employs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning