Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu

TL;DR
This paper introduces Mask-CNN, an end-to-end model that localizes parts and selects descriptors for fine-grained image recognition, achieving high accuracy with fewer parameters.
Contribution
It presents a novel fully convolutional Mask-CNN model that localizes discriminative parts and selects features without fully connected layers, improving efficiency and accuracy.
Findings
Achieves highest recognition accuracy among compared methods.
Uses fewer parameters and lower feature dimensionality.
Effectively localizes parts and selects descriptors for recognition.
Abstract
Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we propose a novel end-to-end Mask-CNN model without the fully connected layers for fine-grained recognition. Based on the part annotations of fine-grained images, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate object/part masks for selecting useful and meaningful convolutional descriptors. After that, a four-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. The proposed Mask-CNN model has the smallest number of parameters, lowest feature dimensionality and highest recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
