Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

Feiyang Xiao; Qiaoxi Zhu; Jian Guan; Xubo Liu; Haohe Liu; Kejia Zhang,; Wenwu Wang

arXiv:2309.09705·cs.SD·September 19, 2023·1 cites

Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

Feiyang Xiao, Qiaoxi Zhu, Jian Guan, Xubo Liu, Haohe Liu, Kejia Zhang,, Wenwu Wang

PDF

Open Access 1 Repo

TL;DR

Synth-AC introduces a framework that uses synthetic audio generated from text to improve audio captioning models, addressing data scarcity by leveraging cross-domain data and generative models.

Contribution

The paper presents Synth-AC, a novel approach that creates synthetic text-audio pairs using audio generative models to enhance audio captioning performance.

Findings

01

Synth-AC improves captioning accuracy on benchmark datasets.

02

Synthetic data augmentation leads to significant performance gains.

03

The framework is adaptable to various existing models.

Abstract

Data-driven approaches hold promise for audio captioning. However, the development of audio captioning methods can be biased due to the limited availability and quality of text-audio data. This paper proposes a SynthAC framework, which leverages recent advances in audio generative models and commonly available text corpus to create synthetic text-audio pairs, thereby enhancing text-audio representation. Specifically, the text-to-audio generation model, i.e., AudioLDM, is used to generate synthetic audio signals with captions from an image captioning dataset. Our SynthAC expands the availability of well-annotated captions from the text-vision domain to audio captioning, thus enhancing text-audio representation by learning relations within synthetic text-audio pairs. Experiments demonstrate that our SynthAC framework can benefit audio captioning models by incorporating well-annotated text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

littleflyingsheep/synthac
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing