Searching Intrinsic Dimensions of Vision Transformers

Fanghui Xue; Biao Yang; Yingyong Qi; Jack Xin

arXiv:2204.07722·cs.CV·April 19, 2022

Searching Intrinsic Dimensions of Vision Transformers

Fanghui Xue, Biao Yang, Yingyong Qi, Jack Xin

PDF

Open Access

TL;DR

This paper introduces SiDT, a pruning method for vision transformers that reduces computational costs while maintaining or improving performance on complex vision tasks like object detection.

Contribution

We propose SiDT, a novel dimension search-based pruning method for vision transformers applied to complex tasks beyond image classification.

Findings

01

Pruned models with 20-40% dimensions outperform unpruned models.

02

SiDT achieves comparable or better accuracy with reduced complexity.

03

The method outperforms previous pruning techniques in efficiency and effectiveness.

Abstract

It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we propose SiDT, a method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20\% or 40\% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Moreover, we have also provided the complexity analysis and comparisons with the previous pruning methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · CCD and CMOS Imaging Sensors

MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · Softmax · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer