AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou; Guansong Pang; Yu Tian; Shibo He; Jiming Chen

arXiv:2310.18961·cs.CV·January 5, 2026·32 cites

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou, Guansong Pang, Yu Tian, Shibo He, Jiming Chen

PDF

Open Access 3 Repos 2 Models 3 Reviews

TL;DR

AnomalyCLIP leverages large vision-language models by learning object-agnostic prompts to improve zero-shot anomaly detection across diverse domains, focusing on abnormal regions rather than object semantics.

Contribution

It introduces a novel prompt learning method that enables CLIP to detect anomalies without training data, emphasizing abnormality recognition across varied object classes.

Findings

01

Outperforms existing zero-shot anomaly detection methods on 17 datasets.

02

Effective in diverse domains including defect inspection and medical imaging.

03

Achieves state-of-the-art zero-shot segmentation of anomalies.

Abstract

Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The empirical evaluations are extensive covering 17 diverse benchmarks. 2. The proposed approach attempts to learn class-agnostic prompts which seems to contradict the pretrained CLIP that is mostly used to classify semantic objects. The authors successfully mitigate this issue by several technical modules, e.g. object-agnostic text prompt design and DPAM layer.

Weaknesses

1. The prompt template might be too restrictive. "damaged [cls]" may not well represent all types of anomalies. For example, if a component is missing or applying the method to other domains than defect identification the proposed prompt template may not work well. 2. According to Figure 2, the pipeline needs to feed the same images into two visual encoders, this would introduce additional computation overhead. 3. The proposed DPAM layer lacks a theoretical basis. It is not clear why replacing

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

The paper introducing AnomalyCLIP for Zero-shot anomaly detection (ZSAD) stands out for its originality, exemplified by its novel approach of using object-agnostic text prompts for anomaly detection, a creative departure from traditional methods. The quality of the work is underscored by its robust methodology and extensive validation across 17 diverse datasets. The authors effectively communicate their ideas with clarity, making the complex concepts accessible. Significantly, AnomalyCLIP's abil

Weaknesses

A weakness in the paper is the unexplained initial use of the term "glocal." Clarifying this key term when first mentioned would improve understanding and clarity.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. Leveraging pre-trained model like CLIP to address anomaly detection is a good direction and an interesting topic. 2. The paper shows comprehensive and detailed experiments and results which outperforms other baselines.

Weaknesses

1. The idea and some techniques, like glocal context optimization, are similar to a concurrent work[1]. The author may compare with highly related works. [1] Gu, Zhaopeng, et al. "AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models." arXiv preprint arXiv:2308.15366 (2023). 2. The DPAM strategy is confusing. The author claims that the Q-Q, K-K, V-V self-attention suffers from different issues, and V-V self-attention derives the best result. However, these three variants

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Data-Driven Disease Surveillance

MethodsContrastive Language-Image Pre-training · Focus