LookSharp: Attention Entropy Minimization for Test-Time Adaptation

Yash Mali; Evan Shelhamer

arXiv:2511.18925·cs.CV·February 10, 2026

LookSharp: Attention Entropy Minimization for Test-Time Adaptation

Yash Mali, Evan Shelhamer

PDF

Open Access

TL;DR

LookSharp introduces a novel test-time adaptation method that minimizes attention entropy in transformers, improving robustness to distribution shifts while maintaining performance on clean data.

Contribution

It proposes using attention entropy minimization in transformers as a new TTA objective, complementing output entropy minimization.

Findings

01

Improves robustness on ImageNet-C

02

Maintains performance on clean data

03

Complementary to output entropy minimization

Abstract

Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose LookSharp, which minimizes the entropy of CLS-to-patch attention in the final layer as a novel TTA objective, encouraging the model to maintain focused attention on shifted data. We demonstrate that attention entropy minimization improves robustness on ImageNet-C. We also show that it is complementary to output entropy minimization and maintains performance on clean data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications