TurboViT: Generating Fast Vision Transformers via Generative Architecture Search
Alexander Wong, Saad Abbasi, Saeejith Nair

TL;DR
TurboViT is a novel vision transformer architecture generated via generative architecture search, achieving high accuracy with significantly reduced computational complexity and latency, suitable for real-world high-throughput applications.
Contribution
This paper introduces TurboViT, a new efficient vision transformer architecture designed through generative architecture search, balancing accuracy and computational efficiency.
Findings
TurboViT achieves >2.47× smaller complexity than FasterViT-0 with same accuracy.
TurboViT has >3.4× fewer FLOPs and 0.9% higher accuracy than MobileViT2-2.0.
TurboViT demonstrates >3.21× lower latency and >3.18× higher throughput in low-latency scenarios.
Abstract
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. However, the architectural and computational complexity of such network architectures have made them challenging to deploy in real-world applications with high-throughput, low-memory requirements. As such, there has been significant research recently on the design of efficient vision transformer architectures. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search (GAS) to achieve a strong balance between accuracy and architectural and computational efficiency. Through this generative architecture search process, we create TurboViT, a highly efficient hierarchical vision transformer architecture design that is generated around mask unit attention and Q-pooling design patterns. The resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Visual Attention and Saliency Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Residual Connection · Softmax · Vision Transformer
