PViT: Prior-augmented Vision Transformer for Out-of-distribution   Detection

Tianhao Zhang; Zhixiang Chen; Lyudmila S. Mihaylova

arXiv:2410.20631·cs.CV·January 15, 2025

PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection

Tianhao Zhang, Zhixiang Chen, Lyudmila S. Mihaylova

PDF

Open Access 1 Repo

TL;DR

PViT is a novel framework that enhances Vision Transformer's robustness for out-of-distribution detection by leveraging prior class logits, significantly outperforming existing methods on large-scale benchmarks without extra data modeling.

Contribution

Introduces PViT, a generic prior-augmented Vision Transformer that improves OOD detection by using prior logits, without additional data modeling or structural changes.

Findings

01

PViT outperforms SOTA methods on ImageNet benchmark.

02

PViT achieves lower FPR95 and higher AUROC scores.

03

The approach does not require extra data generation or structural modifications.

Abstract

Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). Taking as input the prior class logits from a pretrained model, we train PViT to predict the class logits. During inference, PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art(SOTA) OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence, without requiring additional data modeling, generation methods, or structural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RanchoGoose/PViT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors

MethodsLinear Layer · Label Smoothing · Byte Pair Encoding · Multi-Head Attention · Softmax · Adam · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer