AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison

Xi Jiang; Yue Guo; Jian Li; Yong Liu; Bin-Bin Gao; Hanqiu Deng; Jun Liu; Heng Zhao; Chengjie Wang; and Feng Zheng

arXiv:2603.13779·cs.CV·April 22, 2026

AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison

Xi Jiang, Yue Guo, Jian Li, Yong Liu, Bin-Bin Gao, Hanqiu Deng, Jun Liu, Heng Zhao, Chengjie Wang, and Feng Zheng

PDF

1 Models

TL;DR

AD-Copilot is a specialized multimodal language model designed for industrial anomaly detection, utilizing visual in-context comparison and a novel dataset to outperform existing models and even surpass human experts.

Contribution

The paper introduces AD-Copilot, a new vision-language model with a comparison encoder and a large-scale industrial dataset, advancing IAD performance beyond prior models.

Findings

01

Achieves 82.3% accuracy on MMAD benchmark

02

Outperforms all models without data leakage

03

Surpasses human experts on several IAD tasks

Abstract

Multimodal Large Language Models (MLLMs) have achieved impressive success in natural visual understanding, yet they consistently underperform in industrial anomaly detection (IAD). This is because MLLMs trained mostly on general web data differ significantly from industrial images. Moreover, they encode each image independently and can only compare images in the language space, making them insensitive to subtle visual differences that are key to IAD. To tackle these issues, we present AD-Copilot, an interactive MLLM specialized for IAD via visual in-context comparison. We first design a novel data curation pipeline to mine inspection knowledge from sparsely labeled industrial images and generate precise samples for captioning, VQA, and defect localization, yielding a large-scale multimodal dataset Chat-AD rich in semantic signals for IAD. On this foundation, AD-Copilot incorporates a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
jiang-cc/AD-Copilot
model· 364 dl· ♡ 1
364 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.