AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted   Augmentations

David Xu

arXiv:2405.11093·eess.AS·June 10, 2024

AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

David Xu

PDF

Open Access

TL;DR

AudioSetMix introduces a scalable, LLM-assisted data augmentation method that enhances audio-language datasets, leading to improved model performance and addressing data quality and diversity limitations in the domain.

Contribution

The paper presents a novel LLM-based augmentation technique to generate high-quality, diverse audio-caption pairs, significantly improving dataset size and quality for audio-language learning.

Findings

01

Improved model performance on multiple benchmarks.

02

Addresses lack of modifiers in existing datasets.

03

Achieves state-of-the-art results with augmented data.

Abstract

Multi-modal learning in the audio-language domain has seen significant advancements in recent years. However, audio-language learning faces challenges due to limited and lower-quality data compared to image-language tasks. Existing audio-language datasets are notably smaller, and manual labeling is hindered by the need to listen to entire audio clips for accurate labeling. Our method systematically generates audio-caption pairs by augmenting audio clips with natural language labels and corresponding audio signal processing operations. Leveraging a Large Language Model, we generate descriptions of augmented audio clips with a prompt template. This scalable method produces AudioSetMix, a high-quality training dataset for text-and-audio related models. Integration of our dataset improves models performance on benchmarks by providing diversified and better-aligned examples. Notably, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Diverse Musicological Studies