IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain
Hong Huang, Weixiang Sun, Zhijian Wu, Jingwen Niu, Donghuan Lu, Xian Wu, Yefeng Zheng

TL;DR
IQE-CLIP introduces an instance-aware query embedding framework that enhances zero-/few-shot anomaly detection in medical images by integrating visual and textual information, outperforming existing methods.
Contribution
The paper proposes IQE-CLIP, a novel framework that leverages instance-aware query embeddings with class-based and learnable prompts for improved medical anomaly detection.
Findings
Achieves state-of-the-art results on six medical datasets.
Effective in both zero-shot and few-shot anomaly detection tasks.
Demonstrates the importance of instance-aware embeddings in medical imaging.
Abstract
Recently, the rapid advancements of vision-language models, such as CLIP, leads to significant progress in zero-/few-shot anomaly detection (ZFSAD) tasks. However, most existing CLIP-based ZFSAD methods commonly assume prior knowledge of categories and rely on carefully crafted prompts tailored to specific scenarios. While such meticulously designed text prompts effectively capture semantic information in the textual space, they fall short of distinguishing normal and anomalous instances within the joint embedding space. Moreover, these ZFSAD methods are predominantly explored in industrial scenarios, with few efforts conducted to medical tasks. To this end, we propose an innovative framework for ZFSAD tasks in medical domain, denoted as IQE-CLIP. We reveal that query embeddings, which incorporate both textual and instance-aware visual information, are better indicators for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsFocus · Contrastive Language-Image Pre-training
