TL;DR
This large-scale study evaluates the accuracy and cost trade-offs in fine-grained image recognition across various training and evaluation setups, proposing methods to reduce inference costs while maintaining high accuracy.
Contribution
The paper conducts extensive experiments on FGIR, extends CAL with new augmentations, and introduces an efficient evaluation variant to balance accuracy and inference costs.
Findings
Data-aware augmentations enable high accuracy without crops.
The evaluation-only variant reduces inference costs significantly.
Extensive experiments across datasets and backbones validate the methods.
Abstract
Prior work on fine-grained image recognition (FGIR) has established the importance of the backbone selection, but has neglected the accuracy-vs-cost trade-offs under different training and evaluation settings. In this work we conduct a large-scale study with over 2000 experiments across 6 training and evaluation settings, 9 pretrained backbones, and 17 datasets. Preliminary observations on the effectiveness of data augmentation for fine-grained training motivate us to extend Counterfactual Attention Learning (CAL), a state-of-the-art method based on data-aware cropping and masking augmentations, with cross-image discriminative region mixing augmentation. We also propose an efficient evaluation-only variant that maintains competitive accuracy while reducing inference costs by forfeiting the forward pass on discriminative crops that is normally used by CAL and similar FGIR methods. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
