Masked Transformer for image Anomaly Localization

Axel De Nardin; Pankaj Mishra; Gian Luca Foresti; Claudio Piciarelli

arXiv:2210.15540·cs.CV·October 28, 2022

Masked Transformer for image Anomaly Localization

Axel De Nardin, Pankaj Mishra, Gian Luca Foresti, Claudio Piciarelli

PDF

TL;DR

This paper introduces a Vision Transformer-based model with patch masking for image anomaly localization, improving detection accuracy by reconstructing patches from surrounding data rather than the patch itself.

Contribution

The paper proposes a novel transformer architecture utilizing multi-resolution patch masking for more effective anomaly detection in images.

Findings

01

Achieved competitive results on MVTec and head CT datasets.

02

Multi-resolution patches significantly improve detection performance.

03

Outperforms several existing state-of-the-art methods.

Abstract

Image anomaly detection consists in detecting images or image portions that are visually different from the majority of the samples in a dataset. The task is of practical importance for various real-life applications like biomedical image analysis, visual inspection in industrial production, banking, traffic management, etc. Most of the current deep learning approaches rely on image reconstruction: the input image is projected in some latent space and then reconstructed, assuming that the network (mostly trained on normal data) will not be able to reconstruct the anomalous portions. However, this assumption does not always hold. We thus propose a new model based on the Vision Transformer architecture with patch masking: the input image is split in several patches, and each patch is reconstructed only from the surrounding data, thus ignoring the potentially anomalous information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection