Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh, Sonal Kumar, Zhifeng Kong, Rafael Valle and, Bryan Catanzaro, Dinesh Manocha

TL;DR
Synthio introduces a novel synthetic data augmentation method for small-scale audio classification datasets, leveraging text-to-audio diffusion models aligned with dataset characteristics and enhanced by large language model-generated captions, significantly improving accuracy.
Contribution
The paper presents a new approach combining preference optimization and LLM-generated captions to produce diverse, acoustically consistent synthetic audio data for small datasets.
Findings
Outperforms baselines by up to 39% in accuracy.
Effective in ten datasets with limited data scenarios.
Uses weakly-captioned AudioSet for training T2A models.
Abstract
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. Our goal is to improve audio classification accuracy with limited labeled data. Traditional data augmentation techniques, which apply artificial transformations (e.g., adding random noise or masking segments), struggle to create data that captures the true diversity present in real-world audios. To address this shortcoming, we propose to augment the dataset with synthetic audio generated from text-to-audio (T2A) diffusion models. However, synthesizing effective augmentations is challenging because not only should the generated data be acoustically consistent with the underlying small-scale dataset, but they should also have sufficient compositional diversity. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing
MethodsALIGN · Diffusion
