Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection
Tsun-Hin Cheung, Ka-Chun Fung, Songjiang Lai, Kwan-Ho Lin, Vincent Ng,, Kin-Man Lam

TL;DR
This paper introduces a zero-shot, training-free method for industrial image anomaly detection using a multimodal pipeline with GPT-3, Grounding DINO, and CLIP, enabling scalable and objective quality control.
Contribution
The paper presents a novel multimodal approach combining large language models, grounding object detection, and image-text matching for zero-shot anomaly detection in industrial images.
Findings
Achieves high accuracy on MVTec-AD and VisA datasets.
Operates without any model training or fine-tuning.
Provides an efficient and scalable solution for industrial quality control.
Abstract
Identifying defects and anomalies in industrial products is a critical quality control task. Traditional manual inspection methods are slow, subjective, and error-prone. In this work, we propose a novel zero-shot training-free approach for automated industrial image anomaly detection using a multimodal machine learning pipeline, consisting of three foundation models. Our method first uses a large language model, i.e., GPT-3. generate text prompts describing the expected appearances of normal and abnormal products. We then use a grounding object detection model, called Grounding DINO, to locate the product in the image. Finally, we compare the cropped product image patches to the generated prompts using a zero-shot image-text matching model, called CLIP, to identify any anomalies. Our experiments on two datasets of industrial product images, namely MVTec-AD and VisA, demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Advanced Neural Network Applications
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Cosine Annealing · Multi-Head Attention · Byte Pair Encoding · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Residual Connection · Softmax
