Score Distillation Sampling for Audio: Source Separation, Synthesis, and   Beyond

Jessie Richter-Powell; Antonio Torralba; Jonathan Lorraine

arXiv:2505.04621·cs.SD·May 8, 2025

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

Jessie Richter-Powell, Antonio Torralba, Jonathan Lorraine

PDF

Open Access

TL;DR

This paper extends Score Distillation Sampling to audio, enabling versatile tasks like source separation and synthesis using a single pretrained model without specialized datasets.

Contribution

We introduce Audio-SDS, a generalization of SDS for text-conditioned audio diffusion, broadening its application to various audio tasks.

Findings

01

Enables source separation, impact sound simulation, and FM-synthesis calibration.

02

Operates without task-specific datasets using a single pretrained model.

03

Demonstrates versatility of distillation-based methods across audio modalities.

Abstract

We introduce Audio-SDS, a generalization of Score Distillation Sampling (SDS) to text-conditioned audio diffusion models. While SDS was initially designed for text-to-3D generation using image diffusion, its core idea of distilling a powerful generative prior into a separate parametric representation extends to the audio domain. Leveraging a single pretrained model, Audio-SDS enables a broad range of tasks without requiring specialized datasets. In particular, we demonstrate how Audio-SDS can guide physically informed impact sound simulations, calibrate FM-synthesis parameters, and perform prompt-specified source separation. Our findings illustrate the versatility of distillation-based methods across modalities and establish a robust foundation for future work using generative priors in audio tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Speech and Audio Processing

MethodsDiffusion