AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation
Xi Jiang, Yinjie Zhao, Zesheng Yang, Feng Zheng

TL;DR
AnomalyClaw is a training-free visual anomaly detection agent that uses a multi-round refutation process with a tool library to improve cross-domain anomaly detection performance.
Contribution
It introduces a novel multi-round refutation approach leveraging a tool library, enhancing VLM-based anomaly detection without training.
Findings
Achieves +6.23 pp macro-AUROC on GPT-5.5 over direct inference.
Improves performance on multiple datasets, including Seed2.0-lite and Qwen3.5-VL-27B.
Self-evolution extension further boosts accuracy without oracle labels.
Abstract
Visual anomaly detection (VAD) is crucial in many real-world fields, such as industrial inspection, medical imaging, infrastructure monitoring, and remote sensing. However, the specific anomaly definitions, data modalities, and annotation standards across different domains make it difficult to transfer single-domain trained VAD models. Vision-language models (VLMs), pre-trained on large-scale cross-domain data, can perform visual perception under task instructions, offering a promising solution for cross-domain VAD. However, single-inference VLM judgments are unreliable, since they rely more on prior knowledge than on normal-sample references or fine-grained feature evidence. We therefore present AnomalyClaw, a training-free VAD agent that turns anomaly judgment into a multi-round refutation process. In each round, the agent proposes candidate anomalies and refutes each against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
