Recent Advances in Vision Transformer: A Survey and Outlook of Recent   Work

Khawar Islam

arXiv:2203.01536·cs.CV·October 18, 2023·6 cites

Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

Khawar Islam

PDF

Open Access

TL;DR

This survey reviews recent developments in Vision Transformers, comparing their performance, strengths, and limitations, and discusses future research directions in the field of computer vision.

Contribution

It provides a comprehensive overview of recent ViT methods, analyzing their strengths, weaknesses, computational costs, and benchmarking performance against CNNs.

Findings

01

ViTs outperform CNNs on several vision tasks.

02

Current ViTs face limitations in computational efficiency.

03

Future research should address scalability and robustness.

Abstract

Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding technique in computer vision, ViTs have been successfully solved various vision problems while focusing on long-range relationships. In this paper, we begin by introducing the fundamental concepts and background of the self-attention mechanism. Next, we provide a comprehensive overview of recent top-performing ViT methods describing in terms of strength and weakness, computational cost as well as training and testing dataset. We thoroughly compare the performance of various ViT algorithms and most representative CNN methods on popular benchmark datasets. Finally, we explore some limitations with insightful observations and provide further research direction. The project page along with the collections of papers are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Generative Adversarial Networks and Image Synthesis