ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation
Shengze Li, Jianjian Cao, Peng Ye, Yuhan Ding, Chongjun Tu, Tao Chen

TL;DR
ClipSAM combines CLIP's semantic understanding with SAM's mask refinement to improve zero-shot anomaly segmentation, addressing limitations of previous methods by localizing anomalies and refining masks effectively.
Contribution
This work introduces a novel collaboration framework, ClipSAM, integrating CLIP and SAM with multi-scale interaction and mask refinement for enhanced zero-shot anomaly segmentation.
Findings
Achieves state-of-the-art results on MVTec-AD and VisA datasets.
Effectively localizes anomalies using CLIP's semantic features.
Refines segmentation masks through hierarchical prompts with SAM.
Abstract
Recently, foundational models such as CLIP and SAM have shown promising performance for the task of Zero-Shot Anomaly Segmentation (ZSAS). However, either CLIP-based or SAM-based ZSAS methods still suffer from non-negligible key drawbacks: 1) CLIP primarily focuses on global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; 2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this work, we innovatively propose a CLIP and SAM collaboration framework called ClipSAM for ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. In details, we introduce a crucial Unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Data-Driven Disease Surveillance
MethodsSegment Anything Model · Contrastive Language-Image Pre-training
