Part-guided Relational Transformers for Fine-grained Visual Recognition

Yifan Zhao; Jia Li; Xiaowu Chen; Yonghong Tian

arXiv:2212.13685·cs.CV·December 29, 2022

Part-guided Relational Transformers for Fine-grained Visual Recognition

Yifan Zhao, Jia Li, Xiaowu Chen, Yonghong Tian

PDF

1 Repo

TL;DR

This paper introduces PART, a unified framework using relational transformers for fine-grained visual recognition, which automatically discovers discriminative parts and models their relationships, achieving state-of-the-art results without extra inference complexity.

Contribution

The paper presents a novel part-guided relational transformer framework that automatically discovers discriminative regions and models their correlations for improved fine-grained recognition.

Findings

01

Achieves state-of-the-art performance on three benchmarks.

02

Effectively discovers discriminative parts without extra inference cost.

03

Enhances spatial interactions among semantic features.

Abstract

Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development of deep CNNs. However, handling subtle differences between different subcategories still remains a challenge. In this paper, we propose to solve this issue in one unified framework from two aspects, i.e., constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models from the field of natural language processing. The part discovery module efficiently discovers the discriminative regions which are highly-corresponded to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

icvteam/part
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing