Attention Is not Everything: Efficient Alternatives for Vision
Nur Mohammad Kazi, Ibteshum Khaled, Md. Luthful Hasan Galib, Ali Faruk Shihab, Md. Rakibul Islam

TL;DR
This review categorizes and analyzes non-Transformer computer vision methods, focusing on their efficiency, scalability, interpretability, and robustness, to identify future research opportunities.
Contribution
It provides a comprehensive taxonomy of non-Transformer vision methods, organizing 40 papers into categories and evaluating their strengths and challenges.
Findings
Non-Transformer methods are competitive with Transformers in some aspects.
The taxonomy highlights key differences in efficiency and robustness.
Future research opportunities are identified in scalability and interpretability.
Abstract
Recently computer vision has seen advancements mainly thanks to Transformer-based models. However many non-Transformer methods are still doing well being a direct competition of Transformer-based models. This review tries to present a comprehensive taxonomy of such methods and organize these methods into categories like convolution-based models, MLP-based models, state-space-based and more. These methods are looked at in terms of how efficient they are, how well they scale, how easy they are to understand and how robust they are. A total of 40 papers were chosen for this study. The goal is to give a view of non-Transformer methods and find out what challenges and opportunities exist for future computer vision research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
