Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification
Amir Erfan Eshratifar, David Eigen, Michael Gormish, Massoud Pedram

TL;DR
This paper introduces Coarse2Fine, a two-stage training method for fine-grained visual classification that improves attention models by better localizing discriminative features, leading to state-of-the-art accuracy.
Contribution
The paper proposes a novel training approach, Coarse2Fine, which creates a differentiable path to enhance attention models for fine-grained classification.
Findings
Surpasses state-of-the-art accuracy on fine-grained tasks
Effective inverse mapping from attended features to image regions
Orthogonal initialization of attention weights improves performance
Abstract
Small inter-class and large intra-class variations are the main challenges in fine-grained visual classification. Objects from different classes share visually similar structures and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g. bird's beak or car's headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the input space to the attended feature maps. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
