Which Tokens to Use? Investigating Token Reduction in Vision   Transformers

Joakim Bruslund Haurum; Sergio Escalera; Graham W. Taylor; Thomas B.; Moeslund

arXiv:2308.04657·cs.CV·August 10, 2023·2 cites

Which Tokens to Use? Investigating Token Reduction in Vision Transformers

Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B., Moeslund

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper investigates how different token reduction methods in Vision Transformers affect model efficiency and performance, revealing that reduction patterns vary across methods and datasets, with some patterns serving as proxies for model success.

Contribution

It provides a comprehensive analysis of 10 token reduction methods across multiple datasets, highlighting the effectiveness of Top-K pruning and the correlation of reduction patterns with model performance.

Findings

01

Top-K pruning is a strong baseline.

02

Reduction patterns vary with model capacity.

03

Pattern similarity correlates with performance.

Abstract

Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JoakimHaurum/TokenReduction
pytorch

Models

🤗
joakimbh/TokenReduction
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCurrency Recognition and Detection · CCD and CMOS Imaging Sensors · Cell Image Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout