OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning
Shifang Zhao, Yiheng Lin, Lu Han, Yao Zhao, Yunchao Wei

TL;DR
OmniAD is a multimodal framework that combines visual and textual reasoning to detect and analyze industrial anomalies with high accuracy, surpassing existing models on multiple benchmarks.
Contribution
The paper introduces OmniAD, a novel multimodal reasoning framework that unifies anomaly detection and understanding, utilizing visual and textual processes for detailed industrial analysis.
Findings
Achieves 79.1 on MMAD benchmark, outperforming prior models.
Effectively integrates visual perception with reasoning for anomaly analysis.
Demonstrates strong performance across multiple benchmarks.
Abstract
While anomaly detection has made significant progress, generating detailed analyses that incorporate industrial knowledge remains a challenge. To address this gap, we introduce OmniAD, a novel framework that unifies anomaly detection and understanding for fine-grained analysis. OmniAD is a multimodal reasoner that combines visual and textual reasoning processes. The visual reasoning provides detailed inspection by leveraging Text-as-Mask Encoding to perform anomaly detection through text generation without manually selected thresholds. Following this, Visual Guided Textual Reasoning conducts comprehensive analysis by integrating visual perception. To enhance few-shot generalization, we employ an integrated training strategy that combines supervised fine-tuning (SFT) with reinforcement learning (GRPO), incorporating three sophisticated reward functions. Experimental results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data Visualization and Analytics · Multimodal Machine Learning Applications
