ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference

Haoyue Zhang; Jie Zhang; Song Guo

arXiv:2507.16260·cs.CV·July 23, 2025

ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference

Haoyue Zhang, Jie Zhang, Song Guo

PDF

Open Access

TL;DR

This paper introduces ToFe, a novel framework for vision transformers that freezes and reuses tokens to significantly reduce computation while maintaining high accuracy, suitable for resource-limited devices.

Contribution

ToFe is the first method to freeze and reuse tokens dynamically in vision transformers, balancing efficiency and performance through end-to-end training.

Findings

01

Reduces LV-ViT computational cost by 50%.

02

Achieves less than 2% accuracy drop.

03

Outperforms existing token reduction methods.

Abstract

Although vision transformers (ViT) have shown remarkable success in various vision tasks, their computationally expensive self-attention hinder their deployment on resource-constrained devices. Token reduction, which discards less important tokens during forward propagation, has been proposed to enhance the efficiency of transformer models. However, existing methods handle unimportant tokens irreversibly, preventing their reuse in subsequent blocks. Considering that transformers focus on different information among blocks, tokens reduced in early blocks might be useful later. Furthermore, to adapt transformer models for resource-constrained devices, it is crucial to strike a balance between model performance and computational overhead. To address these challenges, in this paper, we introduce a novel Token Freezing and Reusing (ToFe) framework, where we identify important tokens at each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors