In-Context Prompt Editing For Conditional Audio Generation

Ernie Chang; Pin-Jie Lin; Yang Li; Sidd Srinivasan; Gael Le Lan; David; Kant; Yangyang Shi; Forrest Iandola; Vikas Chandra

arXiv:2311.00895·cs.SD·November 3, 2023·1 cites

In-Context Prompt Editing For Conditional Audio Generation

Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David, Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra

PDF

Open Access

TL;DR

This paper introduces a retrieval-based in-context prompt editing method that improves the quality of text-to-audio generation by revisiting user prompts with training captions as exemplars, addressing distributional shift issues.

Contribution

The paper proposes a novel retrieval-based prompt editing framework that enhances audio quality in conditional generation by leveraging training captions as exemplars.

Findings

01

Audio quality improved across user prompts

02

Prompt editing reduces distributional shift effects

03

Framework leverages training captions as exemplars

Abstract

Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional audio generation in the wild as user prompts are under-specified. In particular, we observe a consistent audio quality degradation in generated audio samples with user prompts, as opposed to training set prompts. To this end, we present a retrieval-based in-context prompt editing framework that leverages the training captions as demonstrative exemplars to revisit the user prompts. We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsSparse Evolutionary Training