Which Transformer to Favor: A Comparative Analysis of Efficiency in   Vision Transformers

Tobias Christian Nauen; Sebastian Palacio; Federico Raue; Andreas; Dengel

arXiv:2308.09372·cs.CV·February 25, 2025

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Tobias Christian Nauen, Sebastian Palacio, Federico Raue, Andreas, Dengel

PDF

Open Access 1 Repo

TL;DR

This paper presents a comprehensive benchmark of over 45 vision transformer models, evaluating their efficiency in terms of accuracy, speed, and memory, revealing ViT's Pareto optimality and highlighting the efficiency of hybrid models.

Contribution

It offers a standardized, large-scale benchmark for efficiency in vision transformers, enabling fair comparison and guiding model selection.

Findings

01

ViT remains Pareto optimal across multiple metrics.

02

Hybrid attention-CNN models show high memory and parameter efficiency.

03

Using larger models generally yields better efficiency than higher resolution images.

Abstract

Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity, but their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing their efficiency. However, diverse experimental conditions, spanning multiple input domains, prevent a fair comparison based solely on reported results, posing challenges for model selection. To address this gap in comparability, we perform a large-scale benchmark of more than 45 models for image classification, evaluating key efficiency aspects, including accuracy, speed, and memory usage. Our benchmark provides a standardized baseline for efficiency-oriented transformers. We analyze the results based on the Pareto front -- the boundary of optimal models. Surprisingly, despite claims of other models being more efficient, ViT remains Pareto optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tobna/whattransformertofavor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices