TL;DR
This paper introduces a hierarchical context-based method for out-of-distribution detection using vision-language models, enabling precise category descriptions and efficient extension to new categories, demonstrated on ImageNet-1K.
Contribution
It proposes a novel hierarchical context framework for OOD detection that improves category description precision and supports scalable, category-extensible recognition within vision-language models.
Findings
CATEX outperforms existing methods on ImageNet-1K
Hierarchical contexts improve OOD detection accuracy
Method enables scalable recognition of thousands of categories
Abstract
The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
