SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation
Qiyu Liao, Xin Yuan, Min Xu, Dadong Wang

TL;DR
This paper introduces SGIA, a novel data augmentation method using Sequence Latent Diffusion Models and Bridging Transfer Learning to improve fine-grained visual classification, especially in few-shot scenarios, by generating more realistic and diverse images.
Contribution
The study presents a new generative augmentation approach with BTL that enhances dataset variability and realism, outperforming existing methods in FGVC tasks.
Findings
Outperforms existing augmentation methods in FGVC accuracy
Generates more realistic and diverse images with pose variations
Achieves a 0.5% accuracy improvement on CUB-200-2011
Abstract
In Fine-Grained Visual Classification (FGVC), distinguishing highly similar subcategories remains a formidable challenge, often necessitating datasets with extensive variability. The acquisition and annotation of such FGVC datasets are notably difficult and costly, demanding specialized knowledge to identify subtle distinctions among closely related categories. Our study introduces a novel approach employing the Sequence Latent Diffusion Model (SLDM) for augmenting FGVC datasets, called Sequence Generative Image Augmentation (SGIA). Our method features a unique Bridging Transfer Learning (BTL) process, designed to minimize the domain gap between real and synthetically augmented data. This approach notably surpasses existing methods in generating more realistic image samples, providing a diverse range of pose transformations that extend beyond the traditional rigid transformations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsDiffusion · Latent Diffusion Model
