SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving

Ayush Barik; Sofia Stoica; Nikhil Sarda; Arnav Kethana; Abhinav Khanduja; Muchen Xu; Fan Lai

arXiv:2603.07865·cs.SD·March 10, 2026

SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving

Ayush Barik, Sofia Stoica, Nikhil Sarda, Arnav Kethana, Abhinav Khanduja, Muchen Xu, Fan Lai

PDF

Open Access

TL;DR

SoundWeaver is a training-free, model-agnostic system that accelerates text-to-audio diffusion by leveraging semantically similar cached audio to reduce latency while maintaining quality.

Contribution

It introduces a novel warm-starting approach with semantic retrieval, dynamic skipping, and cache management for efficient text-to-audio diffusion serving.

Findings

01

Achieves 1.8--3.0× latency reduction with small cache size.

02

Maintains or improves perceptual audio quality.

03

Operates without additional training, enhancing practicality.

Abstract

Text-to-audio diffusion models produce high-fidelity audio but require tens of function evaluations (NFEs), incurring multi-second latency and limited throughput. We present SoundWeaver, the first training-free, model-agnostic serving system that accelerates text-to-audio diffusion by warm-starting from semantically similar cached audio. SoundWeaver introduces three components: a Reference Selector that retrieves and temporally aligns cached candidates via semantic and duration-aware gating; a Skip Gater that dynamically determines the percentage of NFEs to skip; and a lightweight Cache Manager that maintains cache utility through quality-aware eviction and refinement. On real-world audio traces, SoundWeaver achieves 1.8--3.0 $\times$ latency reduction with a cache of only $\sim$ 1K entries while preserving or improving perceptual quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis