Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration

Fanhu Zeng; Deli Yu; Zhenglun Kong; Hao Tang

arXiv:2506.05709·cs.CV·June 9, 2025

Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration

Fanhu Zeng, Deli Yu, Zhenglun Kong, Hao Tang

PDF

Open Access

TL;DR

This paper introduces a unified, training-free token transformation framework for vision transformers that reduces computational costs by compressing tokens through explicit matrix transformations, improving efficiency with minimal accuracy loss.

Contribution

It unifies existing token compression methods into a general, training-free matrix transformation framework, enabling effective acceleration of vision transformers across various tasks.

Findings

01

Reduced 40% FLOPs in DeiT-S with only 0.1% accuracy drop

02

Achieved 1.5x acceleration on DeiT-S

03

Extended the method to multiple dense prediction tasks with consistent improvements

Abstract

Vision transformers have been widely explored in various vision tasks. Due to heavy computational cost, much interest has aroused for compressing vision transformer dynamically in the aspect of tokens. Current methods mainly pay attention to token pruning or merging to reduce token numbers, in which tokens are compressed exclusively, causing great information loss and therefore post-training is inevitably required to recover the performance. In this paper, we rethink token reduction and unify the process as an explicit form of token matrix transformation, in which all existing methods are constructing special forms of matrices within the framework. Furthermore, we propose a many-to-many Token Transforming framework that serves as a generalization of all existing methods and reserves the most information, even enabling training-free acceleration. We conduct extensive experiments to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging

MethodsDense Connections · Layer Normalization · Vision Transformer · Pruning