TL;DR
This paper introduces PART, a unified framework using relational transformers for fine-grained visual recognition, which automatically discovers discriminative parts and models their relationships, achieving state-of-the-art results without extra inference complexity.
Contribution
The paper presents a novel part-guided relational transformer framework that automatically discovers discriminative regions and models their correlations for improved fine-grained recognition.
Findings
Achieves state-of-the-art performance on three benchmarks.
Effectively discovers discriminative parts without extra inference cost.
Enhances spatial interactions among semantic features.
Abstract
Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development of deep CNNs. However, handling subtle differences between different subcategories still remains a challenge. In this paper, we propose to solve this issue in one unified framework from two aspects, i.e., constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models from the field of natural language processing. The part discovery module efficiently discovers the discriminative regions which are highly-corresponded to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing
