R\'enyi Entropy: A New Token Pruning Metric for Vision Transformers
Wei-Yuan Su, Ruijie Zhang, Zheng Zhang

TL;DR
This paper introduces Col-Ln, a training-free token importance metric based on Rényi entropy, improving token pruning in Vision Transformers by identifying informative tokens early, leading to more efficient inference.
Contribution
The paper proposes a novel, training-free token importance metric derived from Rényi entropy that enhances early-layer token pruning in Vision Transformers.
Findings
Outperforms existing pruning methods across multiple benchmarks.
Enables reliable token importance estimation from the first layer.
Reduces inference complexity without retraining.
Abstract
Vision Transformers (ViTs) achieve state-of-the-art performance but suffer from the complexity of self-attention, making inference costly for high-resolution inputs. To address this bottleneck, token pruning has emerged as a critical technique to accelerate inference. Most existing methods rely on the [CLS] token to estimate patch importance. However, we argue that the [CLS] token can be unreliable in early layers where semantic representations are still immature. As a result, pruning in the early layer often leads to inaccurate importance estimation and unnecessary information loss. In this work, we propose a training-free token importance metric, namely Col-Ln, which is derived from R\'enyi entropy that enables the identification of informative tokens from the first layer of the network, thereby enabling more reliable pruning in token reduction. Extensive experiments on ViTs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
