Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
Sy-Tuyen Ho, Tuan Van Vo, Somayeh Ebrahimkhani, Ngai-Man Cheung

TL;DR
This paper introduces OoD-ViT-NAS, a comprehensive benchmark for evaluating Vision Transformer architectures on out-of-distribution generalization, revealing key factors influencing robustness and the limited effectiveness of existing NAS methods.
Contribution
It presents the first systematic benchmark for ViT NAS focused on OoD generalization, analyzes factors affecting robustness, and evaluates the performance of training-free NAS methods.
Findings
ViT architecture design significantly impacts OoD generalization.
ID accuracy is a poor predictor of OoD accuracy.
Simple proxies like parameter count outperform complex NAS methods in predicting OoD robustness.
Abstract
While ViTs have achieved across machine learning tasks, deploying them in real-world scenarios faces a critical challenge: generalizing under OoD shifts. A crucial research gap exists in understanding how to design ViT architectures, both manually and automatically, for better OoD generalization. To this end, we introduce OoD-ViT-NAS, the first systematic benchmark for ViTs NAS focused on OoD generalization. This benchmark includes 3000 ViT architectures of varying computational budgets evaluated on 8 common OoD datasets. Using this benchmark, we analyze factors contributing to OoD generalization. Our findings reveal key insights. First, ViT architecture designs significantly affect OoD generalization. Second, ID accuracy is often a poor indicator of OoD accuracy, highlighting the risk of optimizing ViT architectures solely for ID performance. Third, we perform the first study of NAS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Processing Techniques and Applications · Neural Networks and Applications · CCD and CMOS Imaging Sensors
