Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

Xiangyu Wu; Dongming Jiang; Feng Yu; Yueying Tian; Jiaqi Tang; Qing-Guo Chen; Yang Yang; Jianfeng Lu

arXiv:2602.11743·cs.CV·February 13, 2026

Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

Xiangyu Wu, Dongming Jiang, Feng Yu, Yueying Tian, Jiaqi Tang, Qing-Guo Chen, Yang Yang, Jianfeng Lu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Adaptive Debiasing Tsallis Entropy (ADTE), a novel method for test-time adaptation of vision-language models that addresses bias in uncertainty estimation by customizing a non-extensive entropy measure for improved performance across diverse benchmarks.

Contribution

The paper proposes ADTE, an adaptive, bias-aware entropy measure that improves test-time adaptation of models like CLIP without hyperparameter tuning.

Findings

01

ADTE outperforms state-of-the-art methods on ImageNet and variants.

02

ADTE achieves highest average performance on 10 cross-domain benchmarks.

03

Both TE and ADTE serve as effective alternatives to Shannon Entropy in TTA.

Abstract

Mainstream Test-Time Adaptation (TTA) methods for adapting vision-language models, e.g., CLIP, typically rely on Shannon Entropy (SE) at test time to measure prediction uncertainty and inconsistency. However, since CLIP has a built-in bias from pretraining on highly imbalanced web-crawled data, SE inevitably results in producing biased estimates of uncertainty entropy. To address this issue, we notably find and demonstrate that Tsallis Entropy (TE), a generalized form of SE, is naturally suited for characterizing biased distributions by introducing a non-extensive parameter q, with the performance of SE serving as a lower bound for TE. Building upon this, we generalize TE into Adaptive Debiasing Tsallis Entropy (ADTE) for TTA, customizing a class-specific parameter q^l derived by normalizing the estimated label bias from continuously incoming test instances, for each category. This…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. Extensive experiments on ImageNet, its variants, and 10 cross-domain datasets, demonstrate the effectiveness of the proposed method. 2. The approach is simple yet effective.

Weaknesses

1. To further demonstrate the effectiveness of the proposed method, it would be better to conduct more experiments on the ImageNet-C, CIFAR10-C, and CIFAR100-C datasets. 2. The paper overlooks prior studies that have explored the use of Tsallis Entropy in test-time adaptation and domain adaptation. References [1–2] should be discussed in the related works section to better contextualize the novelty of ADTE. 3. The claim of requiring no hyperparameter tuning is somewhat misleading, as the method

Reviewer 02Rating 4Confidence 3

Strengths

- Effectively addresses the problem of biased predictions by generalizing SE to TE - Plug-and-play design which can integrate into existing TTA frameworks without retraining - Achieves consistent performance improvements across OOD and cross-domain benchmarks - Computationally lightweight, minimal additional cost or tuning required

Weaknesses

- In TTA settings like ZERO, where adaptation relies only on confident-view selection per instance, considering class-wise bias may be less relevant or overcomplicated. - The paper lacks in-depth analysis explaining why ADTE particularly improves performance on cross-domain tasks, despite showing larger gains there in ablations. - When LA is removed, performance becomes similar to competitive baselines, raising doubts about whether ADTE itself is a truly effective standalone TTA solution. - The

Reviewer 03Rating 6Confidence 3

Strengths

The motivation and proposed method are intuitive and interesting. This paper adopts an intuitive approach of using parameter-free TE to address the prediction bias problem in CLIP. Despite its simple concept, the method proves to be effective. Since TE is a generalization of SE, ADTE can be incorporated into existing SE-based TTA methods. In an era where large-scale models trained on web-scale datasets with inherent prediction biases are mainstream, employing TE is a reasonable approach.

Weaknesses

This paper lacks experiments with models other than CLIP, raising concerns about the applicability of ADTE. For example, can ADTE be applied to unimodal models pretrained on ImageNet or subsequent models of CLIP, such as SigLIP? ADTE is a method specialized for scenarios with predictive bias, but how common are such scenarios? It is desirable to have a discussion about scenarios where ADTE is effective (such as model or dataset distributions).

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications