Attention mechanisms and deep learning for machine vision: A survey of   the state of the art

Abdul Mueed Hafiz; Shabir Ahmad Parah; Rouf Ul Alam Bhat

arXiv:2106.07550·cs.CV·June 15, 2021·5 cites

Attention mechanisms and deep learning for machine vision: A survey of the state of the art

Abdul Mueed Hafiz, Shabir Ahmad Parah, Rouf Ul Alam Bhat

PDF

Open Access 1 Repo

TL;DR

This survey reviews recent advances in attention mechanisms and deep learning for machine vision, highlighting vision transformers, their challenges, and hybrid approaches combining attention with traditional methods.

Contribution

It provides a comprehensive overview of attention-based deep architectures in machine vision, discussing key algorithms, issues, and emerging trends in the field.

Findings

01

Vision transformers challenge traditional deep learning methods.

02

Hybrid models leverage advantages of attention and deep learning.

03

Attention mechanisms improve performance but require large data and resources.

Abstract

With the advent of state of the art nature-inspired pure attention based models i.e. transformers, and their success in natural language processing (NLP), their extension to machine vision (MV) tasks was inevitable and much felt. Subsequently, vision transformers (ViTs) were introduced which are giving quite a challenge to the established deep learning based machine vision techniques. However, pure attention based models/architectures like transformers require huge data, large training times and large computational resources. Some recent works suggest that combinations of these two varied fields can prove to build systems which have the advantages of both these fields. Accordingly, this state of the art survey paper is introduced which hopefully will help readers get useful information about this interesting and potential research area. A gentle introduction to attention mechanisms is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

changlin31/BossNAS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning