FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model
Yuanwei Li, Elizaveta Ivanova, Martins Bruveris

TL;DR
FADE leverages large vision-language models, specifically CLIP, to enable effective zero-shot and few-shot anomaly detection in industrial images by adapting multi-scale embeddings and ensemble text prompts.
Contribution
This paper introduces FADE, a novel approach that adapts CLIP for industrial anomaly detection, improving language-guided segmentation and combining vision and language cues.
Findings
Outperforms state-of-the-art in anomaly segmentation
Achieves pixel-AUROC of 89.6% in zero-shot
Achieves pixel-AUROC of 95.4% in 1-normal-shot
Abstract
Automatic image anomaly detection is important for quality inspection in the manufacturing industry. The usual unsupervised anomaly detection approach is to train a model for each object class using a dataset of normal samples. However, a more realistic problem is zero-/few-shot anomaly detection where zero or only a few normal samples are available. This makes the training of object-specific models challenging. Recently, large foundation vision-language models have shown strong zero-shot performance in various downstream tasks. While these models have learned complex relationships between vision and language, they are not specifically designed for the tasks of anomaly detection. In this paper, we propose the Few-shot/zero-shot Anomaly Detection Engine (FADE) which leverages the vision-language CLIP model and adjusts it for the purpose of industrial anomaly detection. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Cell Image Analysis Techniques
MethodsContrastive Language-Image Pre-training
