TL;DR
TINS introduces a test-time method that learns sample-specific negative semantics for improved out-of-distribution detection in vision-language models, addressing limitations of static negative labels.
Contribution
The paper proposes TINS, a novel test-time approach that learns and regularizes negative semantics dynamically, enhancing OOD detection performance over existing static methods.
Findings
TINS significantly reduces FPR95 on Four-OOD benchmark from 14.04% to 6.72%.
Extensive experiments show consistent improvements across multiple OOD detection benchmarks.
The method effectively stabilizes negative semantics expansion through group-wise aggregation and buffer updates.
Abstract
Vision-language models enable OOD detection by comparing image alignment with ID labels and negative semantics. Existing negative-label-based methods mainly rely on static negative labels constructed before inference, limiting their ability to cover diverse and evolving OOD concepts. Although test-time expansion provides a natural solution, naively learning negative semantics from potential OOD samples may introduce hard ID contamination. To address this issue, we propose a \textbf{T}est-time \textbf{I}D-prototype-separated \textbf{N}egative \textbf{S}emantics learning method, termed \textbf{TINS}. TINS learns sample-specific negative text embeddings via image-to-text modality inversion and introduces ID-prototype-separated regularization to keep them separated from ID semantics. To further stabilize negative semantics expansion, TINS employs group-wise aggregation scoring and a buffer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
