TL;DR
This paper introduces IAD-R1, a universal post-training framework that significantly improves vision-language models' ability to detect industrial anomalies by enhancing perception and reasoning capabilities through a two-stage training process.
Contribution
IAD-R1 is a novel two-stage training framework that enhances VLMs for industrial anomaly detection, combining high-quality reasoning datasets and policy optimization.
Findings
Average accuracy improved by 43.3% on DAGM dataset
0.5B parameter model surpasses GPT-4.1 and Claude-Sonnet-4 in zero-shot detection
Significant performance gains across 7 different VLM architectures
Abstract
Industrial anomaly detection is a critical component of modern manufacturing, yet the scarcity of defective samples restricts traditional detection methods to scenario-specific applications. Although Vision-Language Models (VLMs) demonstrate significant advantages in generalization capabilities, their performance in industrial anomaly detection remains limited. To address this challenge, we propose IAD-R1, a universal post-training framework applicable to VLMs of different architectures and parameter scales, which substantially enhances their anomaly detection capabilities. IAD-R1 employs a two-stage training strategy: the Perception Activation Supervised Fine-Tuning (PA-SFT) stage utilizes a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
