Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey
Chaohua Li, Enhao Zhang, Chuanxing Geng, Songcan Chen

TL;DR
This survey reviews recent progress in out-of-distribution detection using CLIP-like vision-language models, proposing a new categorization framework based on image and text modality utilization, and discusses future research directions.
Contribution
It introduces a novel categorization scheme for CLIP-based OOD detection methods based on modality usage and training strategies, and highlights open problems and future research directions.
Findings
New categorization framework for CLIP-based OOD detection methods
Identification of open problems in cross-modal OOD detection
Discussion of promising future research directions
Abstract
Out-of-distribution detection (OOD) is a pivotal task for real-world applications that trains models to identify samples that are distributionally different from the in-distribution (ID) data during testing. Recent advances in AI, particularly Vision-Language Models (VLMs) like CLIP, have revolutionized OOD detection by shifting from traditional unimodal image detectors to multimodal image-text detectors. This shift has inspired extensive research; however, existing categorization schemes (e.g., few- or zero-shot types) still rely solely on the availability of ID images, adhering to a unimodal paradigm. To better align with CLIP's cross-modal nature, we propose a new categorization framework rooted in both image and text modalities. Specifically, we categorize existing methods based on how visual and textual information of OOD data is utilized within image + text modalities, and further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Distributed Sensor Networks and Detection Algorithms · Image and Signal Denoising Methods
MethodsContrastive Language-Image Pre-training · ALIGN
