TL;DR
Transformaly enhances anomaly detection by combining pre-trained Vision Transformer features with teacher-student fine-tuned features, leveraging normal sample information for improved detection accuracy.
Contribution
The paper introduces a novel method that uses a teacher-student training approach to extract complementary features from a pre-trained ViT for anomaly detection.
Findings
Achieves state-of-the-art AUROC results in unimodal settings.
Outperforms existing methods in multimodal anomaly detection.
Effectively utilizes normal samples to improve detection accuracy.
Abstract
Anomaly detection is a well-established research area that seeks to identify samples outside of a predetermined distribution. An anomaly detection pipeline is comprised of two main stages: (1) feature extraction and (2) normality score assignment. Recent papers used pre-trained networks for feature extraction achieving state-of-the-art results. However, the use of pre-trained networks does not fully-utilize the normal samples that are available at train time. This paper suggests taking advantage of this information by using teacher-student training. In our setting, a pretrained teacher network is used to train a student network on the normal training samples. Since the student network is trained only on normal samples, it is expected to deviate from the teacher network in abnormal cases. This difference can serve as a complementary representation to the pre-trained feature vector. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer
