A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

Edwin Arkel Rios,Augusto Christian Surya,Oswin Gosal,Fernando Mikael,Mary Madeline Nicole,Kisoon Jang,Bo-Cheng Lai,Min-Chun Hu

arXiv:2605.18700·cs.CV·May 19, 2026

A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

Edwin Arkel Rios,Augusto Christian Surya,Oswin Gosal,Fernando Mikael,Mary Madeline Nicole,Kisoon Jang,Bo-Cheng Lai,Min-Chun Hu

PDF

1 Repo

TL;DR

This large-scale study evaluates the accuracy and cost trade-offs in fine-grained image recognition across various training and evaluation setups, proposing methods to reduce inference costs while maintaining high accuracy.

Contribution

The paper conducts extensive experiments on FGIR, extends CAL with new augmentations, and introduces an efficient evaluation variant to balance accuracy and inference costs.

Findings

01

Data-aware augmentations enable high accuracy without crops.

02

The evaluation-only variant reduces inference costs significantly.

03

Extensive experiments across datasets and backbones validate the methods.

Abstract

Prior work on fine-grained image recognition (FGIR) has established the importance of the backbone selection, but has neglected the accuracy-vs-cost trade-offs under different training and evaluation settings. In this work we conduct a large-scale study with over 2000 experiments across 6 training and evaluation settings, 9 pretrained backbones, and 17 datasets. Preliminary observations on the effectiveness of data augmentation for fine-grained training motivate us to extend Counterfactual Attention Learning (CAL), a state-of-the-art method based on data-aware cropping and masking augmentations, with cross-image discriminative region mixing augmentation. We also propose an efficient evaluation-only variant that maintains competitive accuracy while reducing inference costs by forfeiting the forward pass on discriminative crops that is normally used by CAL and similar FGIR methods. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arkel23/FGIR-Backbones
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.