VarAD: Lightweight High-Resolution Image Anomaly Detection via Visual Autoregressive Modeling
Yunkang Cao, Haiming Yao, Wei Luo, Weiming Shen

TL;DR
VarAD introduces a lightweight, high-resolution image anomaly detection method using visual autoregressive modeling, effectively capturing global information and outperforming existing methods on multiple datasets.
Contribution
The paper proposes VarAD, a novel autoregressive approach for high-resolution image anomaly detection that leverages multi-hierarchy token sequences and a lightweight model for superior performance.
Findings
Achieves state-of-the-art results on four datasets.
Maintains lightweight architecture suitable for high-resolution images.
Demonstrates effectiveness on real-world inspection data.
Abstract
This paper addresses a practical task: High-Resolution Image Anomaly Detection (HRIAD). In comparison to conventional image anomaly detection for low-resolution images, HRIAD imposes a heavier computational burden and necessitates superior global information capture capacity. To tackle HRIAD, this paper translates image anomaly detection into visual token prediction and proposes VarAD based on visual autoregressive modeling for token prediction. Specifically, VarAD first extracts multi-hierarchy and multi-directional visual token sequences, and then employs an advanced model, Mamba, for visual autoregressive modeling and token prediction. During the prediction process, VarAD effectively exploits information from all preceding tokens to predict the target token. Finally, the discrepancies between predicted tokens and original tokens are utilized to score anomalies. Comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
