SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol, Nieto

TL;DR
SILA introduces a novel method for fine-grained control over acoustic parameters in text-to-audio generation, enabling more expressive and customizable sound synthesis beyond traditional DSP techniques.
Contribution
The paper presents a model-agnostic approach that learns to disentangle audio semantics from acoustic features, allowing precise control over sound characteristics in generated audio.
Findings
Effective control over loudness, pitch, reverb, and other parameters.
High-quality audio outputs closely match user specifications.
Enhanced versatility for creative sound design.
Abstract
The field of text-to-audio generation has seen significant advancements, and yet the ability to finely control the acoustic characteristics of generated audio remains under-explored. In this paper, we introduce a novel yet simple approach to generate sound effects with control over key acoustic parameters such as loudness, pitch, reverb, fade, brightness, noise and duration, enabling creative applications in sound design and content creation. These parameters extend beyond traditional Digital Signal Processing (DSP) techniques, incorporating learned representations that capture the subtleties of how sound characteristics can be shaped in context, enabling a richer and more nuanced control over the generated audio. Our approach is model-agnostic and is based on learning the disentanglement between audio semantics and its acoustic features. Our approach not only enhances the versatility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
