AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
Bin-Bin Gao, Yue Zhou, Jiangtao Yan, Yuezhi Cai, Weixi Zhang, Meng Wang, Jun Liu, Yong Liu, Lei Wang, Chengjie Wang

TL;DR
AdaptCLIP introduces a simple, training-free method that leverages adaptive representations and comparative learning to enhance zero-shot visual anomaly detection across diverse domains, outperforming existing methods.
Contribution
It proposes a novel approach that adapts CLIP with three simple adapters and a new learning strategy, enabling effective zero-shot anomaly detection without additional fine-tuning.
Findings
Achieves state-of-the-art results on 12 benchmarks
Supports zero-/few-shot generalization across domains
Outperforms existing methods significantly
Abstract
Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle with designing prompt templates, complex token interactions, or requiring additional fine-tuning, resulting in limited flexibility. In this work, we present a simple yet effective method called AdaptCLIP based on two key insights. First, adaptive visual and textual representations should be learned alternately rather than jointly. Second, comparative learning between query and normal image prompt should incorporate both contextual and aligned residual features, rather than relying solely on residual features. AdaptCLIP treats CLIP models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
Methodstravel james · Balanced Selection · Contrastive Language-Image Pre-training
