Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen, Sebastian Palacio, Federico Raue, Andreas, Dengel

TL;DR
This paper presents a comprehensive benchmark of over 45 vision transformer models, evaluating their efficiency in terms of accuracy, speed, and memory, revealing ViT's Pareto optimality and highlighting the efficiency of hybrid models.
Contribution
It offers a standardized, large-scale benchmark for efficiency in vision transformers, enabling fair comparison and guiding model selection.
Findings
ViT remains Pareto optimal across multiple metrics.
Hybrid attention-CNN models show high memory and parameter efficiency.
Using larger models generally yields better efficiency than higher resolution images.
Abstract
Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity, but their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing their efficiency. However, diverse experimental conditions, spanning multiple input domains, prevent a fair comparison based solely on reported results, posing challenges for model selection. To address this gap in comparability, we perform a large-scale benchmark of more than 45 models for image classification, evaluating key efficiency aspects, including accuracy, speed, and memory usage. Our benchmark provides a standardized baseline for efficiency-oriented transformers. We analyze the results based on the Pareto front -- the boundary of optimal models. Surprisingly, despite claims of other models being more efficient, ViT remains Pareto optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
