LookSharp: Attention Entropy Minimization for Test-Time Adaptation
Yash Mali, Evan Shelhamer

TL;DR
LookSharp introduces a novel test-time adaptation method that minimizes attention entropy in transformers, improving robustness to distribution shifts while maintaining performance on clean data.
Contribution
It proposes using attention entropy minimization in transformers as a new TTA objective, complementing output entropy minimization.
Findings
Improves robustness on ImageNet-C
Maintains performance on clean data
Complementary to output entropy minimization
Abstract
Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose LookSharp, which minimizes the entropy of CLS-to-patch attention in the final layer as a novel TTA objective, encouraging the model to maintain focused attention on shifted data. We demonstrate that attention entropy minimization improves robustness on ImageNet-C. We also show that it is complementary to output entropy minimization and maintains performance on clean data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
