Training-free Transformer Architecture Search

Qinqin Zhou; Kekai Sheng; Xiawu Zheng; Ke Li; Xing Sun; Yonghong Tian,; Jie Chen; Rongrong Ji

arXiv:2203.12217·cs.CV·March 24, 2022

Training-free Transformer Architecture Search

Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun, Yonghong Tian,, Jie Chen, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces a training-free method for Transformer architecture search in vision tasks, significantly reducing search time and outperforming existing zero-cost proxies.

Contribution

It proposes the first training-free Transformer architecture search method using a novel DSS-indicator based on synaptic properties, improving efficiency and accuracy.

Findings

01

Achieves competitive performance with state-of-the-art ViT architectures.

02

Reduces search time from 24 GPU days to less than 0.5 GPU days.

03

DSS-indicator outperforms existing zero-cost proxies.

Abstract

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks. The progresses are highly relevant to the architecture design, then it is worthwhile to propose Transformer Architecture Search (TAS) to search for better ViTs automatically. However, current TAS methods are time-consuming and existing zero-cost proxies in CNN do not generalize well to the ViT search space according to our experimental observations. In this paper, for the first time, we investigate how to conduct TAS in a training-free manner and devise an effective training-free TAS (TF-TAS) scheme. Firstly, we observe that the properties of multi-head self-attention (MSA) and multi-layer perceptron (MLP) in ViTs are quite different and that the synaptic diversity of MSA affects the performance notably. Secondly, based on the observation, we devise a modular strategy in TF-TAS that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

decemberzhou/TF_TAS
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Infrared Target Detection Methodologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout