AgentIAD: Agentic Industrial Anomaly Detection via Adaptive Memory Augmentation

Junwen Miao; Penghui Du; Yingying Fan; Yi Liu; Yu Wang; Runze He; Lida Huang; Yan Wang

arXiv:2512.13671·cs.CV·April 17, 2026

AgentIAD: Agentic Industrial Anomaly Detection via Adaptive Memory Augmentation

Junwen Miao, Penghui Du, Yingying Fan, Yi Liu, Yu Wang, Runze He, Lida Huang, Yan Wang

PDF

TL;DR

AgentIAD introduces an iterative, memory-augmented vision--language framework for industrial anomaly detection, enabling active evidence gathering and improved accuracy over state-of-the-art methods.

Contribution

It presents a novel agentic inspection approach with dynamic memory access and a two-stage training strategy for enhanced anomaly detection.

Findings

01

Improves classification accuracy by 5.92% on MMAD benchmark.

02

Enables multi-round reasoning for more reliable anomaly analysis.

03

Outperforms previous state-of-the-art methods with the same backbone.

Abstract

Industrial anomaly detection (IAD) is challenging due to the subtle and highly localized nature of many defects, which single-pass vision--language models (VLMs) often fail to capture. Moreover, existing approaches lack mechanisms to actively acquire complementary evidence during inference. We propose AgentIAD, an agentic vision--language framework that enables iterative industrial inspection through a unified action space. The agent dynamically accesses two forms of memory during inspection: visual memory via the Perceptive Zoomer (PZ) for fine-grained local analysis, and retrieved memory via the Web Searcher (WS) and Comparative Retriever (CR) for external knowledge acquisition and cross-instance verification. This design allows the model to progressively gather evidence through multi-round perception--action reasoning. To effectively learn such policies under sparse supervision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.