Training-Free Multi-Step Audio Source Separation

Yongyi Zang; Jingyi Li; Qiuqiang Kong

arXiv:2505.19534·cs.SD·May 27, 2025

Training-Free Multi-Step Audio Source Separation

Yongyi Zang, Jingyi Li, Qiuqiang Kong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a training-free multi-step inference method for audio source separation that iteratively improves results by blending previous outputs, outperforming traditional one-step models without additional training.

Contribution

The authors propose a novel inference technique that leverages existing pretrained models for multi-step separation, providing theoretical guarantees and empirical improvements across tasks.

Findings

01

Consistently outperforms one-step inference in speech and music separation

02

Achieves performance comparable to training larger models or more data

03

Improves non-optimized metrics beyond the optimization metric

Abstract

Audio source separation aims to separate a mixture into target sources. Previous audio source separation systems usually conduct one-step inference, which does not fully explore the separation ability of models. In this work, we reveal that pretrained one-step audio source separation models can be leveraged for multi-step separation without additional training. We propose a simple yet effective inference method that iteratively applies separation by optimally blending the input mixture with the previous step's separation result. At each step, we determine the optimal blending ratio by maximizing a metric. We prove that our method always yield improvement over one-step inference, provide error bounds based on model smoothness and metric robustness, and provide theoretical analysis connecting our method to denoising along linear interpolation paths between noise and clean distributions, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yongyizang/trainingfreemultistepasr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsDiffusion