Spatial-Temporal Attention Network for Open-Set Fine-Grained Image   Recognition

Jiayin Sun; Hong Wang; Qiulei Dong

arXiv:2211.13940·cs.CV·November 28, 2022·1 cites

Spatial-Temporal Attention Network for Open-Set Fine-Grained Image Recognition

Jiayin Sun, Hong Wang, Qiulei Dong

PDF

Open Access

TL;DR

This paper introduces STAN, a novel spatial-temporal attention network inspired by brain mechanisms, designed to improve fine-grained image recognition, especially in open-set scenarios, by aggregating features over multiple moments.

Contribution

The paper proposes a new spatial-temporal attention network (STAN) that enhances fine-grained recognition by integrating multiple attention modules and a Long Short-Term Memory network.

Findings

01

STAN-OSFGR outperforms 9 state-of-the-art methods on multiple datasets.

02

The proposed model effectively learns accurate attention maps for fine-grained images.

03

Experimental results demonstrate significant improvements in open-set recognition accuracy.

Abstract

Triggered by the success of transformers in various visual tasks, the spatial self-attention mechanism has recently attracted more and more attention in the computer vision community. However, we empirically found that a typical vision transformer with the spatial self-attention mechanism could not learn accurate attention maps for distinguishing different categories of fine-grained images. To address this problem, motivated by the temporal attention mechanism in brains, we propose a spatial-temporal attention network for learning fine-grained feature representations, called STAN, where the features learnt by implementing a sequence of spatial self-attention operations corresponding to multiple moments are aggregated progressively. The proposed STAN consists of four modules: a self-attention backbone module for learning a sequence of features with self-attention operations, a spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer