Is Less More? Exploring Token Condensation as Training-free Test-time   Adaptation

Zixin Wang; Dong Gong; Sen Wang; Zi Huang; Yadan Luo

arXiv:2410.14729·cs.CV·March 18, 2025

Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation

Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces Token Condensation as Adaptation (TCA), a training-free method that improves vision-language model performance on unseen datasets by efficiently condensing tokens, achieving significant accuracy gains and computational savings.

Contribution

The paper proposes TCA, a novel training-free token condensation technique that enhances zero-shot transfer and robustness of vision-language models like CLIP.

Findings

01

Up to 21.4% performance improvement on cross-dataset benchmarks.

02

Reduces GFLOPs by 12.2% to 48.9%.

03

Minimal hyperparameter dependency.

Abstract

Contrastive Language-Image Pretraining (CLIP) excels at learning generalizable image representations but often falls short in zero-shot inference on certain downstream datasets. Test-time adaptation (TTA) mitigates this issue by adjusting components like normalization layers or context prompts, yet it typically requires large batch sizes and extensive augmentations, leading to high computational costs. This raises a key question: Can VLMs' performance drop in specific test cases be mitigated through efficient, training-free approaches? To explore the solution, we investigate token condensation (TC) techniques, originally designed to enhance vision transformer efficiency by refining token usage during inference. We observe that informative tokens improve visual-text alignment in VLMs like CLIP on unseen datasets. However, existing TC methods often fail to maintain in-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jo-wang/tca
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques

MethodsContrastive Language-Image Pre-training · ALIGN · Pruning