Fine-Grained Image Recognition from Scratch with Teacher-Guided Data Augmentation

Edwin Arkel Rios; Fernando Mikael; Oswin Gosal; Femiloye Oyerinde; Hao-Chun Liang; Bo-Cheng Lai; Min-Chun Hu

arXiv:2507.12157·cs.CV·July 17, 2025

Fine-Grained Image Recognition from Scratch with Teacher-Guided Data Augmentation

Edwin Arkel Rios, Fernando Mikael, Oswin Gosal, Femiloye Oyerinde, Hao-Chun Liang, Bo-Cheng Lai, Min-Chun Hu

PDF

Open Access

TL;DR

This paper demonstrates that high-performance fine-grained image recognition can be achieved from scratch using a novel teacher-guided data augmentation framework, enabling task-specific architectures without reliance on pretrained models.

Contribution

Introduces TGDA, a training framework that combines data-aware augmentation and weak supervision, allowing effective training of FGIR models from scratch and facilitating the development of efficient, task-specific architectures.

Findings

01

TGDA enables training from scratch to match or surpass pretrained models.

02

LRNets with TGDA improve accuracy by up to 23% in low-resolution FGIR.

03

ViTFS-T achieves comparable performance to pretrained ViT B-16 with significantly fewer parameters.

Abstract

Fine-grained image recognition (FGIR) aims to distinguish visually similar sub-categories within a broader class, such as identifying bird species. While most existing FGIR methods rely on backbones pretrained on large-scale datasets like ImageNet, this dependence limits adaptability to resource-constrained environments and hinders the development of task-specific architectures tailored to the unique challenges of FGIR. In this work, we challenge the conventional reliance on pretrained models by demonstrating that high-performance FGIR systems can be trained entirely from scratch. We introduce a novel training framework, TGDA, that integrates data-aware augmentation with weak supervision via a fine-grained-aware teacher model, implemented through knowledge distillation. This framework unlocks the design of task-specific and hardware-aware architectures, including LRNets for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection