Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition
Xinda Liu, Lili Wang, Xiaoguang Han

TL;DR
This paper introduces a transformer architecture with peak suppression and knowledge guidance modules to improve fine-grained image recognition by diversifying discriminative features and aggregating clues across multiple images, achieving superior performance.
Contribution
The paper proposes a novel transformer-based method with peak suppression and knowledge guidance modules to enhance discriminative feature utilization in fine-grained recognition tasks.
Findings
Outperforms existing methods on six datasets
Enhances information exploitation of neglected regions
Improves discriminative clue aggregation across images
Abstract
Fine-grained image recognition is challenging because discriminative clues are usually fragmented, whether from a single image or multiple images. Despite their significant improvements, most existing methods still focus on the most discriminative parts from a single image, ignoring informative details in other regions and lacking consideration of clues from other associated images. In this paper, we analyze the difficulties of fine-grained image recognition from a new perspective and propose a transformer architecture with the peak suppression module and knowledge guidance module, which respects the diversification of discriminative features in a single image and the aggregation of discriminative clues among multiple images. Specifically, the peak suppression module first utilizes a linear projection to convert the input image into sequential tokens. It then blocks the token based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
