TurboViT: Generating Fast Vision Transformers via Generative   Architecture Search

Alexander Wong; Saad Abbasi; Saeejith Nair

arXiv:2308.11421·cs.CV·August 23, 2023

TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

Alexander Wong, Saad Abbasi, Saeejith Nair

PDF

Open Access

TL;DR

TurboViT is a novel vision transformer architecture generated via generative architecture search, achieving high accuracy with significantly reduced computational complexity and latency, suitable for real-world high-throughput applications.

Contribution

This paper introduces TurboViT, a new efficient vision transformer architecture designed through generative architecture search, balancing accuracy and computational efficiency.

Findings

01

TurboViT achieves >2.47× smaller complexity than FasterViT-0 with same accuracy.

02

TurboViT has >3.4× fewer FLOPs and 0.9% higher accuracy than MobileViT2-2.0.

03

TurboViT demonstrates >3.21× lower latency and >3.18× higher throughput in low-latency scenarios.

Abstract

Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. However, the architectural and computational complexity of such network architectures have made them challenging to deploy in real-world applications with high-throughput, low-memory requirements. As such, there has been significant research recently on the design of efficient vision transformer architectures. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search (GAS) to achieve a strong balance between accuracy and architectural and computational efficiency. Through this generative architecture search process, we create TurboViT, a highly efficient hierarchical vision transformer architecture design that is generated around mask unit attention and Q-pooling design patterns. The resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Residual Connection · Softmax · Vision Transformer