Transformer with Peak Suppression and Knowledge Guidance for   Fine-grained Image Recognition

Xinda Liu; Lili Wang; Xiaoguang Han

arXiv:2107.06538·cs.MM·June 7, 2022

Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition

Xinda Liu, Lili Wang, Xiaoguang Han

PDF

TL;DR

This paper introduces a transformer architecture with peak suppression and knowledge guidance modules to improve fine-grained image recognition by diversifying discriminative features and aggregating clues across multiple images, achieving superior performance.

Contribution

The paper proposes a novel transformer-based method with peak suppression and knowledge guidance modules to enhance discriminative feature utilization in fine-grained recognition tasks.

Findings

01

Outperforms existing methods on six datasets

02

Enhances information exploitation of neglected regions

03

Improves discriminative clue aggregation across images

Abstract

Fine-grained image recognition is challenging because discriminative clues are usually fragmented, whether from a single image or multiple images. Despite their significant improvements, most existing methods still focus on the most discriminative parts from a single image, ignoring informative details in other regions and lacking consideration of clues from other associated images. In this paper, we analyze the difficulties of fine-grained image recognition from a new perspective and propose a transformer architecture with the peak suppression module and knowledge guidance module, which respects the diversification of discriminative features in a single image and the aggregation of discriminative clues among multiple images. Specifically, the peak suppression module first utilizes a linear projection to convert the input image into sequential tokens. It then blocks the token based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.