TVT: Training-Free Vision Transformer Search on Tiny Datasets
Zimian Wei, Hengyue Pan, Lujun Li, Peijie Dong, Zhiliang Tian, Xin, Niu, Dongsheng Li

TL;DR
This paper introduces TVT, a training-free method for searching optimal Vision Transformer architectures on tiny datasets by leveraging teacher models and novel metrics, achieving superior results without training.
Contribution
The paper proposes a novel training-free ViT search framework that incorporates teacher-aware attention metrics and student-capability measures, improving search effectiveness on small datasets.
Findings
TVT outperforms existing training-free search methods.
Teacher-aware attention metrics improve distillation accuracy.
The approach is effective across various tiny datasets and search spaces.
Abstract
Training-free Vision Transformer (ViT) architecture search is presented to search for a better ViT with zero-cost proxies. While ViTs achieve significant distillation gains from CNN teacher models on small datasets, the current zero-cost proxies in ViTs do not generalize well to the distillation training paradigm according to our experimental observations. In this paper, for the first time, we investigate how to search in a training-free manner with the help of teacher models and devise an effective Training-free ViT (TVT) search framework. Firstly, we observe that the similarity of attention maps between ViT and ConvNet teachers affects distill accuracy notably. Thus, we present a teacher-aware metric conditioned on the feature attention relations between teacher and student. Additionally, TVT employs the L2-Norm of the student's weights as the student-capability metric to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies
MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Adam · Transformer · Dense Connections
