Learning to Be a Transformer to Pinpoint Anomalies

Alex Costanzino; Pierluigi Zama Ramirez; Giuseppe Lisanti; Luigi Di Stefano

arXiv:2407.04092·cs.CV·June 27, 2025

Learning to Be a Transformer to Pinpoint Anomalies

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

PDF

Open Access

TL;DR

This paper introduces a Teacher-Student framework using high-resolution images and pre-trained features to improve anomaly detection and segmentation, especially for tiny defects, with faster processing and state-of-the-art results.

Contribution

The paper presents a novel Teacher-Student paradigm leveraging pre-trained vision Transformers and shallow MLPs to enhance high-resolution anomaly detection and segmentation.

Findings

01

Achieves state-of-the-art performance on MVTec AD.

02

Runs significantly faster than existing methods.

03

Excels at detecting both large and tiny anomalies.

Abstract

To efficiently deploy strong, often pre-trained feature extractors, recent Industrial Anomaly Detection and Segmentation (IADS) methods process low-resolution images, e.g., 224x224 pixels, obtained by downsampling the original input images. However, while numerous industrial applications demand the identification of both large and small defects, downsampling the input image to a low resolution may hinder a method's ability to pinpoint tiny anomalies. We propose a novel Teacher--Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently. The core idea concerns training two shallow MLPs (the Students) by nominal images so as to mimic the mappings between the patch embeddings induced by the self-attention layers of a frozen vision Transformer (the Teacher). Indeed, learning these mappings sets forth a challenging pretext task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsAttention Is All You Need · Softmax · Layer Normalization · Focus · Linear Layer · Dense Connections · Residual Connection · Multi-Head Attention · Vision Transformer