KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration
Chengyuan Li, Suyang Zhou, Jieping Kong, Lei Qi, Hui Xue

TL;DR
KAnoCLIP introduces a knowledge-driven prompt learning framework that enhances zero-shot anomaly detection by integrating general and image-specific knowledge, achieving state-of-the-art results in diverse datasets.
Contribution
This work proposes KAnoCLIP, a novel ZSAD framework that eliminates fixed prompts and improves pixel-level detection through knowledge-driven learning and advanced cross-modal fusion.
Findings
Achieves state-of-the-art performance on 12 datasets.
Outperforms existing methods in generalization.
Enhances pixel-level anomaly segmentation.
Abstract
Zero-shot anomaly detection (ZSAD) identifies anomalies without needing training samples from the target dataset, essential for scenarios with privacy concerns or limited data. Vision-language models like CLIP show potential in ZSAD but have limitations: relying on manually crafted fixed textual descriptions or anomaly prompts is time-consuming and prone to semantic ambiguity, and CLIP struggles with pixel-level anomaly segmentation, focusing more on global semantics than local details. To address these limitations, We introduce KAnoCLIP, a novel ZSAD framework that leverages vision-language models. KAnoCLIP combines general knowledge from a Large Language Model (GPT-3.5) and fine-grained, image-specific knowledge from a Visual Question Answering system (Llama3) via Knowledge-Driven Prompt Learning (KnPL). KnPL uses a knowledge-driven (KD) loss function to create learnable anomaly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Seismology and Earthquake Studies · Advanced Data Processing Techniques
MethodsSoftmax · Attention Is All You Need · ALIGN · Contrastive Language-Image Pre-training
