MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection

Xiran Zhao; Jing Jin; Yan Bai; Zhongan Wang; Yifeng Sun; Yihang Lou; Xuanyu Zhu; Tao Feng; Yingna Wu

arXiv:2605.10833·cs.CV·May 12, 2026

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection

Xiran Zhao, Jing Jin, Yan Bai, Zhongan Wang, Yifeng Sun, Yihang Lou, Xuanyu Zhu, Tao Feng, Yingna Wu

PDF

1 Repo

TL;DR

This paper introduces MMVIAD, a comprehensive multi-view video dataset for industrial anomaly detection, along with a benchmark and a novel two-stage model that significantly improves detection performance.

Contribution

The authors present the first continuous multi-view video dataset for industrial anomaly detection and develop a two-stage post-training pipeline that enhances model accuracy.

Findings

01

Current video MLLMs perform below human level on MMVIAD.

02

VISTA model improves average task score from 45.0 to 57.5 on MMVIAD-Unseen.

03

Source code is publicly available at the provided GitHub link.

Abstract

Industrial anomaly detection is critical for manufacturing quality control, yet existing datasets mainly focus on static images or sparse views, which do not fully reflect continuous inspection processes in real industrial scenarios. We introduce MMVIAD (Multi-view Multi-task Video Industrial Anomaly Detection), to the best of our knowledge the first continuous multi-view video dataset for industrial anomaly detection and understanding, together with a benchmark for multi-task evaluation. MMVIAD contains object-centric 2-second inspection clips with approximately 120 degrees of camera motion, covering 48 object categories, 14 environments, and 6 structural anomaly types. It supports anomaly detection, defect classification, object classification, and anomaly visible-time localization. Systematic evaluations on MMVIAD show that current commercial and open-source video MLLMs remain far…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Georgekeepmoving/MMVIAD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.