Attention for Fine-Grained Categorization
Pierre Sermanet, Andrea Frome, Esteban Real

TL;DR
This paper demonstrates an attention-based recurrent neural network that effectively performs fine-grained dog breed classification by focusing on discriminative image regions without bounding box supervision, outperforming previous models.
Contribution
It introduces a powerful visual network combined with an RNN for attention, achieving state-of-the-art results in fine-grained categorization without spatial annotations.
Findings
Model learns to focus on discriminative regions
Outperforms state-of-the-art models like GoogLeNet
Operates effectively with minimal, low-resolution inputs
Abstract
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the attention RNN. Most work in attention models to date focuses on tasks with toy or more constrained visual environments, whereas we present results for fine-grained categorization better than the state-of-the-art GoogLeNet classification model. We show that our model learns to direct high resolution attention to the most discriminative regions without any spatial supervision such as bounding boxes, and it is able to discriminate fine-grained dog breeds moderately well even when given only an initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling
