Neutralizing Token Aggregation via Information Augmentation for Efficient Test-Time Adaptation
Yizhe Xiong, Zihan Zhou, Yiwen Liang, Hui Chen, Zijia Lin, Tianxiang Hao, Fan Zhang, Jungong Han, Guiguang Ding

TL;DR
This paper introduces NAVIA, a method that augments CLS tokens to recover information lost during token aggregation, enabling efficient and effective test-time adaptation of Vision Transformers with reduced latency.
Contribution
It proposes a novel information augmentation technique for token aggregation in ViTs, backed by theoretical analysis, to improve test-time adaptation efficiency without performance loss.
Findings
NAVIA outperforms state-of-the-art methods by over 2.5% in accuracy.
Achieves more than 20% reduction in inference latency.
Effectively recovers information lost due to token aggregation.
Abstract
Test-Time Adaptation (TTA) has emerged as an effective solution for adapting Vision Transformers (ViT) to distribution shifts without additional training data. However, existing TTA methods often incur substantial computational overhead, limiting their applicability in resource-constrained real-world scenarios. To reduce inference cost, plug-and-play token aggregation methods merge redundant tokens in ViTs to reduce total processed tokens. Albeit efficient, it suffers from significant performance degradation when directly integrated with existing TTA methods. We formalize this problem as Efficient Test-Time Adaptation (ETTA), seeking to preserve the adaptation capability of TTA while reducing inference latency. In this paper, we first provide a theoretical analysis from a novel mutual information perspective, showing that token aggregation inherently leads to information loss, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
