TL;DR
This paper introduces CoPS, a dynamic prompt synthesis framework that improves zero-shot anomaly detection by conditioning prompts on visual features, outperforming existing methods on industrial and medical datasets.
Contribution
CoPS innovatively synthesizes adaptive prompts conditioned on visual prototypes and uses a variational autoencoder to incorporate semantic features, enhancing generalization in ZSAD.
Findings
CoPS achieves 1.4% higher classification AUROC than state-of-the-art.
CoPS improves segmentation AUROC by 1.9%.
Demonstrates effectiveness across 13 industrial and medical datasets.
Abstract
Recently, large pre-trained vision-language models have shown remarkable performance in zero-shot anomaly detection (ZSAD). With fine-tuning on a single auxiliary dataset, the model enables cross-category anomaly detection on diverse datasets covering industrial defects and medical lesions. Compared to manually designed prompts, prompt learning eliminates the need for expert knowledge and trial-and-error. However, it still faces the following challenges: (i) static learnable tokens struggle to capture the continuous and diverse patterns of normal and anomalous states, limiting generalization to unseen categories; (ii) fixed textual labels provide overly sparse category information, making the model prone to overfitting to a specific semantic subspace. To address these issues, we propose Conditional Prompt Synthesis (CoPS), a novel framework that synthesizes dynamic prompts conditioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
