Enhancing Fine-grained Image Classification through Attentive Batch Training
Duy M. Le, Bao Q. Bui, Anh Tran, Cong Tran, Cuong Pham

TL;DR
This paper introduces a novel batch training framework with attention mechanisms that significantly improves fine-grained image classification accuracy by leveraging relationships between images within each batch.
Contribution
It proposes Residual Relationship Attention, Relationship Position Encoding, and Relationship Batch Integration, novel modules and techniques that enhance feature extraction in batch training for fine-grained classification.
Findings
Achieved +2.78% accuracy on CUB200-2011 dataset.
Achieved +3.83% accuracy on Stanford Dog dataset.
Set new state-of-the-art 95.79% accuracy on Stanford Dog.
Abstract
Fine-grained image classification, which is a challenging task in computer vision, requires precise differentiation among visually similar object categories. In this paper, we propose 1) a novel module called Residual Relationship Attention (RRA) that leverages the relationships between images within each training batch to effectively integrate visual feature vectors of batch images and 2) a novel technique called Relationship Position Encoding (RPE), which encodes the positions of relationships between original images in a batch and effectively preserves the relationship information between images within the batch. Additionally, we design a novel framework, namely Relationship Batch Integration (RBI), which utilizes RRA in conjunction with RPE, allowing the discernment of vital visual features that may remain elusive when examining a singular image representative of a particular class.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI
MethodsSoftmax · Attention Is All You Need
