SILA: Signal-to-Language Augmentation for Enhanced Control in   Text-to-Audio Generation

Sonal Kumar; Prem Seetharaman; Justin Salamon; Dinesh Manocha; Oriol; Nieto

arXiv:2412.09789·cs.SD·December 16, 2024

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation

Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol, Nieto

PDF

TL;DR

SILA introduces a novel method for fine-grained control over acoustic parameters in text-to-audio generation, enabling more expressive and customizable sound synthesis beyond traditional DSP techniques.

Contribution

The paper presents a model-agnostic approach that learns to disentangle audio semantics from acoustic features, allowing precise control over sound characteristics in generated audio.

Findings

01

Effective control over loudness, pitch, reverb, and other parameters.

02

High-quality audio outputs closely match user specifications.

03

Enhanced versatility for creative sound design.

Abstract

The field of text-to-audio generation has seen significant advancements, and yet the ability to finely control the acoustic characteristics of generated audio remains under-explored. In this paper, we introduce a novel yet simple approach to generate sound effects with control over key acoustic parameters such as loudness, pitch, reverb, fade, brightness, noise and duration, enabling creative applications in sound design and content creation. These parameters extend beyond traditional Digital Signal Processing (DSP) techniques, incorporating learned representations that capture the subtleties of how sound characteristics can be shaped in context, enabling a richer and more nuanced control over the generated audio. Our approach is model-agnostic and is based on learning the disentanglement between audio semantics and its acoustic features. Our approach not only enhances the versatility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.