Provably Improving Generalization of Few-Shot Models with Synthetic Data

Lan-Cuong Nguyen; Quan Nguyen-Tri; Bang Tran Khanh; Dung D. Le; Long Tran-Thanh; Khoat Than

arXiv:2505.24190·cs.LG·June 26, 2025

Provably Improving Generalization of Few-Shot Models with Synthetic Data

Lan-Cuong Nguyen, Quan Nguyen-Tri, Bang Tran Khanh, Dung D. Le, Long Tran-Thanh, Khoat Than

PDF

Open Access 1 Video

TL;DR

This paper introduces a theoretical framework and a novel algorithm to improve the generalization of few-shot image classification models by effectively utilizing synthetic data, addressing the distribution gap issue.

Contribution

It develops a theoretical understanding of synthetic data impact and proposes a prototype learning-based algorithm to enhance few-shot model generalization.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Provides a theoretical basis for synthetic data generation

03

Effectively bridges the gap between real and synthetic data

Abstract

Few-shot image classification remains challenging due to the scarcity of labeled training examples. Augmenting them with synthetic data has emerged as a promising way to alleviate this issue, but models trained on synthetic samples often face performance degradation due to the inherent gap between real and synthetic distributions. To address this limitation, we develop a theoretical framework that quantifies the impact of such distribution discrepancies on supervised learning, specifically in the context of image classification. More importantly, our framework suggests practical ways to generate good synthetic samples and to train a predictor with high generalization ability. Building upon this framework, we propose a novel theoretical-based algorithm that integrates prototype learning to optimize both data partitioning and model training, effectively bridging the gap between real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Improving Generalization of Few-shot models with Synthetic Data· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks