MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models
Xincheng Yao, Zefeng Qian, Chao Shi, Jiayang Song, Chongyang Zhang

TL;DR
This paper introduces MMR-AD, a large-scale multimodal dataset for benchmarking general anomaly detection with multimodal large language models, revealing current models' limitations and proposing a reasoning-based baseline for improvement.
Contribution
The paper presents MMR-AD, a comprehensive benchmark dataset for MLLM-based anomaly detection, and introduces Anomaly-R1, a reasoning-enhanced baseline model that outperforms existing generalist MLLMs.
Findings
Current SOTA MLLMs underperform in industrial anomaly detection tasks.
Anomaly-R1 significantly improves detection and localization performance.
MMR-AD reveals gaps between pretraining data and AD scenario requirements.
Abstract
In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
