CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation
Zuo Zuo, Jiahao Dong, Yao Wu, Yanyun Qu, Zongze Wu

TL;DR
This paper introduces CLIP3D-AD, a novel method extending CLIP for 3D few-shot anomaly detection by synthesizing multi-view images and fusing features for improved classification and segmentation.
Contribution
The paper proposes a new approach that adapts CLIP for 3D anomaly detection using multi-view image synthesis and feature fusion, addressing modality discrepancies.
Findings
Achieves competitive results on MVTec-3D AD dataset
Effectively combines multi-view features for improved detection
Demonstrates strong generalization in 3D anomaly tasks
Abstract
Few-shot anomaly detection methods can effectively address data collecting difficulty in industrial scenarios. Compared to 2D few-shot anomaly detection (2D-FSAD), 3D few-shot anomaly detection (3D-FSAD) is still an unexplored but essential task. In this paper, we propose CLIP3D-AD, an efficient 3D-FSAD method extended on CLIP. We successfully transfer strong generalization ability of CLIP into 3D-FSAD. Specifically, we synthesize anomalous images on given normal images as sample pairs to adapt CLIP for 3D anomaly classification and segmentation. For classification, we introduce an image adapter and a text adapter to fine-tune global visual features and text features. Meanwhile, we propose a coarse-to-fine decoder to fuse and facilitate intermediate multi-layer visual representations of CLIP. To benefit from geometry information of point cloud and eliminate modality and data discrepancy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Image Processing Techniques and Applications · Medical Imaging Techniques and Applications
MethodsAdapter · Contrastive Language-Image Pre-training
