How to train your ViT for OOD Detection
Maximilian Mueller, Matthias Hein

TL;DR
This paper investigates how different pretraining and finetuning schemes affect VisionTransformers' ability to detect out-of-distribution samples, providing insights and best practices for improving OOD detection performance.
Contribution
It systematically analyzes the impact of pretraining and finetuning methods on ViT OOD detection, proposing a best-practice training recipe.
Findings
Pretraining type significantly influences OOD detection performance.
Certain training schemes are effective only for specific out-distribution types.
A recommended training recipe improves ViT OOD detection across benchmarks.
Abstract
VisionTransformers have been shown to be powerful out-of-distribution detectors for ImageNet-scale settings when finetuned from publicly available checkpoints, often outperforming other model types on popular benchmarks. In this work, we investigate the impact of both the pretraining and finetuning scheme on the performance of ViTs on this task by analyzing a large pool of models. We find that the exact type of pretraining has a strong impact on which method works well and on OOD detection performance in general. We further show that certain training schemes might only be effective for a specific type of out-distribution, but not in general, and identify a best-practice training recipe.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
