Noisy Test-Time Adaptation in Vision-Language Models
Chentao Cao, Zhun Zhong, Zhanke Zhou, Tongliang Liu, Yang Liu, Kun, Zhang, Bo Han

TL;DR
This paper introduces AdaND, a novel noise detection framework that enhances zero-shot test-time adaptation and out-of-distribution detection in vision-language models by decoupling classifier and detector training.
Contribution
It proposes a decoupled framework with AdaND, improving zero-shot noisy test-time adaptation and OOD detection without retraining the classifier.
Findings
AdaND outperforms state-of-the-art methods in ZS-NTTA and ZS-OOD detection.
Achieves 8.32% improvement in harmonic mean accuracy on ImageNet.
Provides a computationally efficient solution comparable to frozen models.
Abstract
Test-time adaptation (TTA) aims to address distribution shifts between source and target data by relying solely on target data during testing. In open-world scenarios, models often encounter noisy samples, i.e., samples outside the in-distribution (ID) label space. Leveraging the zero-shot capability of pre-trained vision-language models (VLMs), this paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner. We find existing TTA methods underperform under ZS-NTTA, often lagging behind even the frozen model. We conduct comprehensive experiments to analyze this phenomenon, revealing that the negative impact of unfiltered noisy data outweighs the benefits of clean data during model updating. Also, adapting a classifier for ID classification and noise detection hampers both sub-tasks. Built on…
Peer Reviews
Decision·ICLR 2025 Poster
1. This work studies a challenging task of zero-shot noisy test-time adaptation, which is practical and applicable to in real-world applications. 2. The three observations are interesting and novel. They reveals the reasons why the NS-TTA is difficult and contribute to the community. 3. The proposed method is reasonable and achieve good results in the experiment.
1. The proposed method studies the ranking distribution of different methods in Fig. 2. Can these ranking really tell the superiority of these methods? It is possible that one method is good at dealing with challenging datasets, and the other is good at easier datasets. Why not evaluate them using the absolute accuracy (instead of the ranks in this work)? 2. How to inject the noise to the data is an important factor of the proposed method. Is there any theoretical analysis? 3. It is better to st
1. The proposed task addresses issues commonly encountered in real-world applications of classification models, which are often overlooked in current research. For instance, existing OOD detection methods assume a clean stream when reporting accuracy, and clip models are applied in noisy TTA tasks for zero-shot usage. 2. Although the method of training an additional noise detector is simple, it proves to be quite effective according to the experimental results. 3. There is improved performance i
1. I have concerns regarding the definition of the task name. The ultimate goal of the proposed ZS-NTTA task is to detect OOD samples and correctly classify In-Distribution (ID) samples. I understand the definitions of noisy TTA and OOD detection, where both aim to handle test sets containing both ID and OOD cases. In my view, TTA leans more towards a methodological strategy focusing on enhancing model performance and adaptability during testing, while OOD detection leans towards a task definiti
The paper is well-organized, and the idea of the Zero-Shot Noisy TTA setting is novel. It offers a comprehensive analysis of existing methods that suffer from Zero-Shot Noisy TTA, and this analysis is easy to follow. The results demonstrate a notable improvement across many benchmarks, and the proposed method AdaND is computationally efficient.
Though Zero-shot Noisy TTA is a new task, the method of utilizing two stages and pseudo labels to train a noisy detector is not novel; the idea of using unlabel test data to train a more robust OOD classifier has been explored in other scenarios, such as [1] for zero-shot OOD detection. It would be better to provide more experiments with more complex datasets, such as near ID/OOD datasets, for example, Imagenet and NINCO[2]. Some recent test-time adaptation works are not compared in the experi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
