TL;DR
This paper introduces VT-ADL, a transformer-based model for detecting and localizing image anomalies, leveraging patch embeddings and a Gaussian mixture model, and provides a new industrial anomaly dataset for benchmarking.
Contribution
The paper proposes a novel transformer-based approach combining reconstruction and patch embedding for improved anomaly detection and localization, along with a new real-world industrial dataset.
Findings
Outperforms existing methods on MNIST and MVTec datasets
Effectively localizes anomalies using transformer and Gaussian mixture network
Provides a new dataset for industrial anomaly detection
Abstract
We present a transformer-based image anomaly detection and localization network. Our proposed model is a combination of a reconstruction-based approach and patch embedding. The use of transformer networks helps to preserve the spatial information of the embedded patches, which are later processed by a Gaussian mixture density network to localize the anomalous areas. In addition, we also publish BTAD, a real-world industrial anomaly dataset. Our results are compared with other state-of-the-art algorithms using publicly available datasets like MNIST and MVTec.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
