M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising
Chengjie Wang, Haokun Zhu, Jinlong Peng, Yue Wang, Ran Yi, Yunsheng, Wu, Lizhuang Ma, Jiangning Zhang

TL;DR
This paper introduces M3DM-NR, a novel multi-modal framework for industrial anomaly detection that effectively handles noisy RGB-3D data by leveraging CLIP's discriminative capabilities across three stages.
Contribution
The paper proposes a three-stage noise-resistant multi-modal anomaly detection framework that integrates RGB, 3D, and text data, addressing noise issues in practical industrial scenarios.
Findings
Outperforms state-of-the-art methods in noisy multi-modal anomaly detection
Effective filtering of noise through multi-stage processing
Demonstrates robustness across diverse industrial datasets
Abstract
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR framework to leveraging strong multi-modal discriminative capabilities of CLIP. M3DM-NR consists of three stages: Stage-I introduces the Suspected References Selection module to filter a few normal samples from the training dataset, using the multimodal features extracted by the Initial Feature Extraction, and a Suspected Anomaly Map Computation module to generate a suspected anomaly map to focus on abnormal regions as reference. Stage-II uses the suspected anomaly maps of the reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsFocus · Contrastive Language-Image Pre-training
