TL;DR
This paper introduces CoDiRe, a novel framework for continual test-time adaptation that uses a robust distillation process guided by a vision-language model and optimal transport to improve model stability and performance under distribution shifts.
Contribution
It proposes a new distillation-based method, CoDiRe, that fuses multiple models and rectifies predictions to enhance continual adaptation performance.
Findings
CoDiRe outperforms state-of-the-art methods like CoTTA by 10.55% on ImageNet-C.
It achieves this with only 48% of the computational cost of CoTTA.
The framework effectively mitigates issues like the Generalist Trap and Entropy Bias.
Abstract
Deep neural networks often suffer performance degradation upon deployment due to distribution shifts. Continual Test-Time Adaptation (CTTA) aims to address this issue in an unsupervised manner. However, existing methods that rely on self-supervision are prone to an inherent self-referential feedback loop that amplifies initial prediction errors, leading to model drift. We revisit this limitation and propose Test-Time Distillation (TTD), which reframes adaptation as a distillation process guided by a frozen Vision-Language Model (VLM) as an external signal. While promising, we find that direct distillation is fraught with two pitfalls: (1) the Generalist Trap, where the VLM's broad but non-specialized knowledge leads to suboptimal performance on specific tasks and shifts; and (2) the Entropy Bias, where naive model fusion techniques based on entropy fail due to the disparate calibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
