AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation
Qingqing Fang, Wenxi Lv, Qinliang Su

TL;DR
AF-CLIP significantly improves zero-shot and few-shot visual anomaly detection by enhancing CLIP's visual features with a lightweight adapter, multi-scale aggregation, and learnable prompts, leading to better localization and classification.
Contribution
The paper introduces AF-CLIP, a novel method that optimizes CLIP's visual features for local anomaly focus using a lightweight adapter and multi-scale aggregation, enhancing zero-/few-shot detection.
Findings
Effective zero-shot anomaly detection across industrial and medical datasets.
Improved localization accuracy for anomalies of different sizes.
Demonstrated generalization and robustness of the method.
Abstract
Visual anomaly detection has been widely used in industrial inspection and medical diagnosis. Existing methods typically demand substantial training samples, limiting their utility in zero-/few-shot scenarios. While recent efforts have leveraged CLIP's zero-shot recognition capability for this task, they often ignore optimizing visual features to focus on local anomalies, reducing their efficacy. In this work, we propose AF-CLIP (Anomaly-Focused CLIP) by dramatically enhancing its visual representations to focus on local defects. Our approach introduces a lightweight adapter that emphasizes anomaly-relevant patterns in visual features, simultaneously optimizing both class-level features for image classification and patch-level features for precise localization. To capture anomalies of different sizes and improve detection accuracy, prior to the adapter, we develop a multi-scale spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
