Coarse2Fine: A Two-stage Training Method for Fine-grained Visual   Classification

Amir Erfan Eshratifar; David Eigen; Michael Gormish; Massoud Pedram

arXiv:1909.02680·cs.CV·September 9, 2019

Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

Amir Erfan Eshratifar, David Eigen, Michael Gormish, Massoud Pedram

PDF

TL;DR

This paper introduces Coarse2Fine, a two-stage training method for fine-grained visual classification that improves attention models by better localizing discriminative features, leading to state-of-the-art accuracy.

Contribution

The paper proposes a novel training approach, Coarse2Fine, which creates a differentiable path to enhance attention models for fine-grained classification.

Findings

01

Surpasses state-of-the-art accuracy on fine-grained tasks

02

Effective inverse mapping from attended features to image regions

03

Orthogonal initialization of attention weights improves performance

Abstract

Small inter-class and large intra-class variations are the main challenges in fine-grained visual classification. Objects from different classes share visually similar structures and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g. bird's beak or car's headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the input space to the attended feature maps. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.