Fine-Grained Image Recognition from Scratch with Teacher-Guided Data Augmentation
Edwin Arkel Rios, Fernando Mikael, Oswin Gosal, Femiloye Oyerinde, Hao-Chun Liang, Bo-Cheng Lai, Min-Chun Hu

TL;DR
This paper demonstrates that high-performance fine-grained image recognition can be achieved from scratch using a novel teacher-guided data augmentation framework, enabling task-specific architectures without reliance on pretrained models.
Contribution
Introduces TGDA, a training framework that combines data-aware augmentation and weak supervision, allowing effective training of FGIR models from scratch and facilitating the development of efficient, task-specific architectures.
Findings
TGDA enables training from scratch to match or surpass pretrained models.
LRNets with TGDA improve accuracy by up to 23% in low-resolution FGIR.
ViTFS-T achieves comparable performance to pretrained ViT B-16 with significantly fewer parameters.
Abstract
Fine-grained image recognition (FGIR) aims to distinguish visually similar sub-categories within a broader class, such as identifying bird species. While most existing FGIR methods rely on backbones pretrained on large-scale datasets like ImageNet, this dependence limits adaptability to resource-constrained environments and hinders the development of task-specific architectures tailored to the unique challenges of FGIR. In this work, we challenge the conventional reliance on pretrained models by demonstrating that high-performance FGIR systems can be trained entirely from scratch. We introduce a novel training framework, TGDA, that integrates data-aware augmentation with weak supervision via a fine-grained-aware teacher model, implemented through knowledge distillation. This framework unlocks the design of task-specific and hardware-aware architectures, including LRNets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
