R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction
Yu Wang, Shuo Ye, Shujian Yu, Xinge You

TL;DR
R2-Trans introduces a novel FGVC method that reduces redundancy in class tokens and adaptively extracts discriminative regions, leading to improved accuracy on benchmark datasets.
Contribution
The paper proposes a new approach combining adaptive masking and the Information Bottleneck to enhance fine-grained visual categorization performance.
Findings
Outperforms state-of-the-art methods on benchmark datasets
Effectively reduces redundant information in class tokens
Improves discriminative region extraction accuracy
Abstract
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class-token contain lots of redundant information, which may also impacts FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification
