Fine-grained pose prediction, normalization, and recognition
Ning Zhang, Evan Shelhamer, Yang Gao, Trevor Darrell

TL;DR
This paper introduces an end-to-end trainable deep network for fine-grained classification that jointly localizes parts and learns pose-normalized features, improving accuracy on the CUB200 dataset.
Contribution
It unifies part localization and feature learning into a single supervised deep network, advancing fine-grained recognition methods.
Findings
Achieves state-of-the-art results on CUB200 dataset.
Demonstrates the effectiveness of end-to-end training with keypoint supervision.
Highlights the importance of strong supervision for fine-grained tasks.
Abstract
Pose variation and subtle differences in appearance are key challenges to fine-grained classification. While deep networks have markedly improved general recognition, many approaches to fine-grained recognition rely on anchoring networks to parts for better accuracy. Identifying parts to find correspondence discounts pose variation so that features can be tuned to appearance. To this end previous methods have examined how to find parts and extract pose-normalized features. These methods have generally separated fine-grained recognition into stages which first localize parts using hand-engineered and coarsely-localized proposal features, and then separately learn deep descriptors centered on inferred part positions. We unify these steps in an end-to-end trainable network supervised by keypoint locations and class labels that localizes parts by a fully convolutional network to focus the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Human Pose and Action Recognition · Image and Object Detection Techniques
