Where to Focus: Deep Attention-based Spatially Recurrent Bilinear   Networks for Fine-Grained Visual Recognition

Lin Wu; Yang Wang

arXiv:1709.05769·cs.CV·September 19, 2017·6 cites

Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

Lin Wu, Yang Wang

PDF

Open Access

TL;DR

This paper introduces a novel attention-based deep neural network that combines two CNNs, bilinear pooling, and spatial recurrent attention to improve fine-grained visual recognition by focusing on critical object regions despite appearance variations.

Contribution

It proposes an end-to-end trainable model integrating two-stream CNNs, bilinear pooling, and spatially recurrent attention for more accurate part detection and feature extraction in fine-grained recognition.

Findings

01

Outperforms existing methods in fine-grained image classification.

02

Effective in person re-identification tasks.

03

Shows robustness to occlusions and viewpoint changes.

Abstract

Fine-grained visual recognition typically depends on modeling subtle difference from object parts. However, these parts often exhibit dramatic visual variations such as occlusions, viewpoints, and spatial transformations, making it hard to detect. In this paper, we present a novel attention-based model to automatically, selectively and accurately focus on critical object regions with higher importance against appearance variations. Given an image, two different Convolutional Neural Networks (CNNs) are constructed, where the outputs of two CNNs are correlated through bilinear pooling to simultaneously focus on discriminative regions and extract relevant features. To capture spatial distributions among the local regions with visual attention, soft attention based spatial Long-Short Term Memory units (LSTMs) are incorporated to realize spatially recurrent yet visually selective over local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Advanced Neural Network Applications