Attention mechanisms and deep learning for machine vision: A survey of the state of the art
Abdul Mueed Hafiz, Shabir Ahmad Parah, Rouf Ul Alam Bhat

TL;DR
This survey reviews recent advances in attention mechanisms and deep learning for machine vision, highlighting vision transformers, their challenges, and hybrid approaches combining attention with traditional methods.
Contribution
It provides a comprehensive overview of attention-based deep architectures in machine vision, discussing key algorithms, issues, and emerging trends in the field.
Findings
Vision transformers challenge traditional deep learning methods.
Hybrid models leverage advantages of attention and deep learning.
Attention mechanisms improve performance but require large data and resources.
Abstract
With the advent of state of the art nature-inspired pure attention based models i.e. transformers, and their success in natural language processing (NLP), their extension to machine vision (MV) tasks was inevitable and much felt. Subsequently, vision transformers (ViTs) were introduced which are giving quite a challenge to the established deep learning based machine vision techniques. However, pure attention based models/architectures like transformers require huge data, large training times and large computational resources. Some recent works suggest that combinations of these two varied fields can prove to build systems which have the advantages of both these fields. Accordingly, this state of the art survey paper is introduced which hopefully will help readers get useful information about this interesting and potential research area. A gentle introduction to attention mechanisms is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
