LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection
Abhishek Moturu, Anna Goldenberg, Babak Taati

TL;DR
LiBaGS is a lightweight, generator-agnostic method for selecting synthetic data that improves training by targeting boundary regions and avoiding redundancy.
Contribution
It introduces a novel boundary-gap based selection method with a stopping rule, soft labels, and diversity to enhance model training with synthetic data.
Findings
LiBaGS outperforms classical oversampling and augmentation methods.
It effectively targets sparse, realistic decision boundary neighborhoods.
Experiments demonstrate improved accuracy with LiBaGS.
Abstract
Synthetic data is useful only when the added samples fill missing parts of the training distribution that matter for the downstream task. We introduce LiBaGS, a lightweight, generator-agnostic method for targeted synthetic training data selection. LiBaGS scores candidate synthetic samples by combining decision-boundary proximity, predictive uncertainty, real-data density, and support validity, so that selected samples are both informative and likely to remain on the real data manifold. We then use a boundary-gap allocation rule that targets sparse but realistic decision-boundary neighborhoods, rather than simply adding more data or selecting only the most uncertain candidates. LiBaGS also learns when enough synthetic samples have been added through a marginal-value stopping rule, assigns softer labels near ambiguous boundaries, and uses a diversity objective to avoid redundant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
