TL;DR
This paper introduces TTL, a novel framework that dynamically learns and updates OOD textual semantics during test time for improved detection with vision-language models, without relying on external labels.
Contribution
TTL enables adaptive OOD detection by learning from unlabeled test streams and maintaining a textual knowledge bank, advancing beyond fixed-label methods.
Findings
TTL achieves state-of-the-art OOD detection performance.
The OOD knowledge purification strategy effectively reduces noise.
Experiments on multiple benchmarks validate TTL's robustness.
Abstract
Vision-language models (VLMs) such as CLIP exhibit strong Out-of-distribution (OOD) detection capabilities by aligning visual and textual representations. Recent CLIP-based test-time adaptation methods further improve detection performance by incorporating external OOD labels. However, such labels are finite and fixed, while the real OOD semantic space is inherently open-ended. Consequently, fixed labels fail to represent the diverse and evolving OOD semantics encountered in test streams. To address this limitation, we introduce Test-time Textual Learning (TTL), a framework that dynamically learns OOD textual semantics from unlabeled test streams, without relying on external OOD labels. TTL updates learnable prompts using pseudo-labeled test samples to capture emerging OOD knowledge. To suppress noise introduced by pseudo-labels, we introduce an OOD knowledge purification strategy that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
