Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection
Ruiying Lu, YuJie Wu, Long Tian, Dongsheng Wang, Bo Chen, Xiyang Liu,, Ruimin Hu

TL;DR
This paper introduces a hierarchical vector quantized Transformer model for multi-class unsupervised anomaly detection, effectively capturing normal patterns as discrete prototypes to improve detection accuracy and interpretability.
Contribution
It proposes a novel hierarchical vector quantized Transformer framework that uses discrete prototypes and a prototype-oriented optimal transport method for improved multi-class anomaly detection.
Findings
Outperforms state-of-the-art on MVTec-AD and VisA datasets.
Addresses codebook collapse and shortcut issues in anomaly detection.
Provides interpretable anomaly scores with a hierarchical prototype approach.
Abstract
Unsupervised image Anomaly Detection (UAD) aims to learn robust and discriminative representations of normal samples. While separate solutions per class endow expensive computation and limited generalizability, this paper focuses on building a unified framework for multiple classes. Under such a challenging setting, popular reconstruction-based networks with continuous latent representation assumption always suffer from the "identical shortcut" issue, where both normal and abnormal samples can be well recovered and difficult to distinguish. To address this pivotal issue, we propose a hierarchical vector quantized prototype-oriented Transformer under a probabilistic framework. First, instead of learning the continuous representations, we preserve the typical normal patterns as discrete iconic prototypes, and confirm the importance of Vector Quantization in preventing the model from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings
