MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in   Industrial Anomaly Detection

Xi Jiang; Jian Li; Hanqiu Deng; Yong Liu; Bin-Bin Gao; Yifeng Zhou,; Jialin Li; Chengjie Wang; Feng Zheng

arXiv:2410.09453·cs.AI·February 24, 2025·3 cites

MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou,, Jialin Li, Chengjie Wang, Feng Zheng

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces MMAD, a comprehensive benchmark for evaluating multimodal large language models in industrial anomaly detection, revealing current models' limitations and exploring potential performance enhancement strategies.

Contribution

It presents the first full-spectrum benchmark dataset and evaluation framework for MLLMs in industrial anomaly detection, including novel evaluation pipeline and analysis.

Findings

01

Commercial models like GPT-4o achieve up to 74.9% accuracy.

02

Current MLLMs underperform relative to industrial standards.

03

Training-free strategies show potential for improving model performance.

Abstract

In the field of industrial inspection, Multimodal Large Language Models (MLLMs) have a high potential to renew the paradigms in practical applications due to their robust language capabilities and generalization abilities. However, despite their impressive problem-solving skills in many domains, MLLMs' ability in industrial anomaly detection has not been systematically studied. To bridge this gap, we present MMAD, the first-ever full-spectrum MLLMs benchmark in industrial Anomaly Detection. We defined seven key subtasks of MLLMs in industrial inspection and designed a novel pipeline to generate the MMAD dataset with 39,672 questions for 8,366 industrial images. With MMAD, we have conducted a comprehensive, quantitative evaluation of various state-of-the-art MLLMs. The commercial models performed the best, with the average accuracy of GPT-4o models reaching 74.9%. However, this result…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jam-cc/mmad
pytorchOfficial

Datasets

jiang-cc/MMAD
dataset· 1.4k dl
1.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Occupational Health and Safety Research