Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B., Moeslund

TL;DR
This paper investigates how different token reduction methods in Vision Transformers affect model efficiency and performance, revealing that reduction patterns vary across methods and datasets, with some patterns serving as proxies for model success.
Contribution
It provides a comprehensive analysis of 10 token reduction methods across multiple datasets, highlighting the effectiveness of Top-K pruning and the correlation of reduction patterns with model performance.
Findings
Top-K pruning is a strong baseline.
Reduction patterns vary with model capacity.
Pattern similarity correlates with performance.
Abstract
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCurrency Recognition and Detection · CCD and CMOS Imaging Sensors · Cell Image Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout
