PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
Tianhao Zhang, Zhixiang Chen, Lyudmila S. Mihaylova

TL;DR
PViT is a novel framework that enhances Vision Transformer's robustness for out-of-distribution detection by leveraging prior class logits, significantly outperforming existing methods on large-scale benchmarks without extra data modeling.
Contribution
Introduces PViT, a generic prior-augmented Vision Transformer that improves OOD detection by using prior logits, without additional data modeling or structural changes.
Findings
PViT outperforms SOTA methods on ImageNet benchmark.
PViT achieves lower FPR95 and higher AUROC scores.
The approach does not require extra data generation or structural modifications.
Abstract
Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). Taking as input the prior class logits from a pretrained model, we train PViT to predict the class logits. During inference, PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art(SOTA) OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence, without requiring additional data modeling, generation methods, or structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors
MethodsLinear Layer · Label Smoothing · Byte Pair Encoding · Multi-Head Attention · Softmax · Adam · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer
