IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

Mengyang Zhao; Teng Fu; Haiyang Yu; Ke Niu; Bin Li

arXiv:2508.10681·cs.CV·August 15, 2025

IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

Mengyang Zhao, Teng Fu, Haiyang Yu, Ke Niu, Bin Li

PDF

TL;DR

IADGPT is a unified large vision-language model designed for few-shot industrial anomaly detection, localization, and reasoning, using a three-stage training process and a new extensive dataset to improve industrial quality inspection tasks.

Contribution

The paper introduces IADGPT, a novel LVLM framework with a three-stage training strategy and in-context learning, specifically tailored for industrial anomaly detection and reasoning.

Findings

01

Significant improvement in anomaly detection accuracy.

02

Effective localization and reasoning capabilities demonstrated.

03

Competitiveness with existing methods in diverse industrial scenarios.

Abstract

Few-Shot Industrial Anomaly Detection (FS-IAD) has important applications in automating industrial quality inspection. Recently, some FS-IAD methods based on Large Vision-Language Models (LVLMs) have been proposed with some achievements through prompt learning or fine-tuning. However, existing LVLMs focus on general tasks but lack basic industrial knowledge and reasoning capabilities related to FS-IAD, making these methods far from specialized human quality inspectors. To address these challenges, we propose a unified framework, IADGPT, designed to perform FS-IAD in a human-like manner, while also handling associated localization and reasoning tasks, even for diverse and novel industrial products. To this end, we introduce a three-stage progressive training strategy inspired by humans. Specifically, the first two stages gradually guide IADGPT in acquiring fundamental industrial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.