IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

Haoyu Zheng; Tianwei Lin; Wei Wang; Zhuonan Wang; Wenqiao Zhang; Jiaqi Zhu; Feifei Shao

arXiv:2604.12440·cs.CV·April 15, 2026

IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

Haoyu Zheng, Tianwei Lin, Wei Wang, Zhuonan Wang, Wenqiao Zhang, Jiaqi Zhu, Feifei Shao

PDF

TL;DR

IAD-Unify is a comprehensive industrial anomaly detection framework that jointly supports defect localization, natural language explanation, and controlled defect editing within a unified model and evaluation platform.

Contribution

It introduces a dual-encoder unified model with a region expert and vision-language backbone, along with a new multi-task evaluation platform for industrial anomaly detection tasks.

Findings

01

Region grounding is crucial for understanding, with a >76 percentage point accuracy drop when removed.

02

Predicted-region performance nearly matches oracle, indicating effective deployment potential.

03

Region-grounded generation outperforms in image fidelity and perceptual quality.

Abstract

Real-world industrial inspection requires not only localizing defects, but also explaining them in natural language and generating controlled defect edits. However, existing approaches fail to jointly support all three capabilities within a unified framework and evaluation protocol. We propose IAD-Unify, a dual-encoder unified framework in which a frozen DINOv2-based region expert supplies precise anomaly evidence to a shared Qwen3.5-4B vision-language backbone via lightweight token injection, jointly enabling anomaly segmentation, region-grounded understanding, and mask-guided generation. To enable unified evaluation, we further construct Anomaly-56K, a comprehensive unified multi-task IAD evaluation platform, spanning 59,916 images across 24 categories and 104 defect variants. Controlled ablations yield four findings: (i) region grounding is the decisive mechanism for understanding,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.