IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models
Hiromu Yakura, Masataka Goto

TL;DR
IteraTTA is an interactive interface that enables users to explore and refine both text prompts and audio priors in text-to-audio music generation, enhancing understanding and control over the generated music.
Contribution
This work introduces a novel interface, IteraTTA, that facilitates dual-sided exploration of text prompts and audio priors for improved music generation with text-to-audio models.
Findings
Users can effectively explore the impact of prompts and priors.
IteraTTA helps users understand the space of generated music.
Design considerations improve interaction with text-to-audio models.
Abstract
Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users cannot listen to the variations of the generated audios simultaneously. We therefore facilitate users in exploring not only text prompts but also audio priors that constrain the text-to-audio music generation process. This dual-sided exploration enables users to discern the impact of different text prompts and audio priors on the generation results through iterative comparison of them. Our developed interface, IteraTTA, is specifically designed to aid users in refining text prompts and selecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Video Analysis and Summarization
