OVO: One-shot Vision Transformer Search with Online distillation
Zimian Wei, Hengyue Pan, Xin Niu, Dongsheng Li

TL;DR
This paper introduces OVO, a one-shot vision transformer search framework utilizing online distillation to efficiently train multiple subnets, achieving high accuracy on ImageNet and CIFAR-100 without extra finetuning.
Contribution
The paper proposes a novel one-shot transformer search method with online distillation that trains numerous subnets simultaneously, improving performance without additional retraining.
Findings
Achieves 73.32% top-1 accuracy on ImageNet
Attains 75.2% accuracy on CIFAR-100
Enables training of thousands of subnets efficiently
Abstract
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsAttention Is All You Need · Softmax · Dense Connections · Linear Layer · Layer Normalization · Residual Connection · Multi-Head Attention · Vision Transformer
