TL;DR
This paper introduces MMVIAD, a comprehensive multi-view video dataset for industrial anomaly detection, along with a benchmark and a novel two-stage model that significantly improves detection performance.
Contribution
The authors present the first continuous multi-view video dataset for industrial anomaly detection and develop a two-stage post-training pipeline that enhances model accuracy.
Findings
Current video MLLMs perform below human level on MMVIAD.
VISTA model improves average task score from 45.0 to 57.5 on MMVIAD-Unseen.
Source code is publicly available at the provided GitHub link.
Abstract
Industrial anomaly detection is critical for manufacturing quality control, yet existing datasets mainly focus on static images or sparse views, which do not fully reflect continuous inspection processes in real industrial scenarios. We introduce MMVIAD (Multi-view Multi-task Video Industrial Anomaly Detection), to the best of our knowledge the first continuous multi-view video dataset for industrial anomaly detection and understanding, together with a benchmark for multi-task evaluation. MMVIAD contains object-centric 2-second inspection clips with approximately 120 degrees of camera motion, covering 48 object categories, 14 environments, and 6 structural anomaly types. It supports anomaly detection, defect classification, object classification, and anomaly visible-time localization. Systematic evaluations on MMVIAD show that current commercial and open-source video MLLMs remain far…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
