SGIA: Enhancing Fine-Grained Visual Classification with Sequence   Generative Image Augmentation

Qiyu Liao; Xin Yuan; Min Xu; Dadong Wang

arXiv:2412.06138·cs.CV·December 10, 2024

SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation

Qiyu Liao, Xin Yuan, Min Xu, Dadong Wang

PDF

Open Access

TL;DR

This paper introduces SGIA, a novel data augmentation method using Sequence Latent Diffusion Models and Bridging Transfer Learning to improve fine-grained visual classification, especially in few-shot scenarios, by generating more realistic and diverse images.

Contribution

The study presents a new generative augmentation approach with BTL that enhances dataset variability and realism, outperforming existing methods in FGVC tasks.

Findings

01

Outperforms existing augmentation methods in FGVC accuracy

02

Generates more realistic and diverse images with pose variations

03

Achieves a 0.5% accuracy improvement on CUB-200-2011

Abstract

In Fine-Grained Visual Classification (FGVC), distinguishing highly similar subcategories remains a formidable challenge, often necessitating datasets with extensive variability. The acquisition and annotation of such FGVC datasets are notably difficult and costly, demanding specialized knowledge to identify subtle distinctions among closely related categories. Our study introduces a novel approach employing the Sequence Latent Diffusion Model (SLDM) for augmenting FGVC datasets, called Sequence Generative Image Augmentation (SGIA). Our method features a unique Bridging Transfer Learning (BTL) process, designed to minimize the domain gap between real and synthetically augmented data. This approach notably surpasses existing methods in generating more realistic image samples, providing a diverse range of pose transformations that extend beyond the traditional rigid transformations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsDiffusion · Latent Diffusion Model