Can Synthetic Audio From Generative Foundation Models Assist Audio   Recognition and Speech Modeling?

Tiantian Feng; Dimitrios Dimitriadis; Shrikanth Narayanan

arXiv:2406.08800·cs.SD·August 30, 2024

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

Tiantian Feng, Dimitrios Dimitriadis, Shrikanth Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether synthetic audio generated by foundation models can improve audio recognition and speech modeling by serving as effective training data and data augmentation resource.

Contribution

It provides a comprehensive evaluation of synthetic audio's utility in audio recognition and speech modeling, highlighting its potential as a training resource.

Findings

01

Synthetic audio can enhance audio recognition performance.

02

Synthetic data serves as a valuable resource for speech modeling.

03

Experiments demonstrate the effectiveness of synthetic audio in various tasks.

Abstract

Recent advances in foundation models have enabled audio-generative models that produce high-fidelity sounds associated with music, events, and human actions. Despite the success achieved in modern audio-generative models, the conventional approach to assessing the quality of the audio generation relies heavily on distance metrics like Frechet Audio Distance. In contrast, we aim to evaluate the quality of audio generation by examining the effectiveness of using them as training data. Specifically, we conduct studies to explore the use of synthetic audio for audio recognition. Moreover, we investigate whether synthetic audio can serve as a resource for data augmentation in speech-related modeling. Our comprehensive experiments demonstrate the potential of using synthetic audio for audio recognition and speech-related modeling. Our code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usc-sail/synthaudio
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing