User-guided Generative Source Separation

Yutong Wen; Minje Kim; and Paris Smaragdis

arXiv:2507.01339·cs.SD·August 6, 2025

User-guided Generative Source Separation

Yutong Wen, Minje Kim, and Paris Smaragdis

PDF

Open Access

TL;DR

GuideSep introduces a flexible, diffusion-based music source separation model that allows user-guided, instrument-agnostic extraction, surpassing traditional fixed-class methods in versatility and quality.

Contribution

This work presents GuideSep, a novel diffusion-based MSS model conditioned on user inputs, enabling versatile and high-quality instrument separation beyond standard four-stem setups.

Findings

01

Achieves high-quality separation with user-guided inputs

02

Demonstrates versatility in extracting various instruments

03

Outperforms prior fixed-class separation methods

Abstract

Music source separation (MSS) aims to extract individual instrument sources from their mixture. While most existing methods focus on the widely adopted four-stem separation setup (vocals, bass, drums, and other instruments), this approach lacks the flexibility needed for real-world applications. To address this, we propose GuideSep, a diffusion-based MSS model capable of instrument-agnostic separation beyond the four-stem setup. GuideSep is conditioned on multiple inputs: a waveform mimicry condition, which can be easily provided by humming or playing the target melody, and mel-spectrogram domain masks, which offer additional guidance for separation. Unlike prior approaches that relied on fixed class labels or sound queries, our conditioning scheme, coupled with the generative approach, provides greater flexibility and applicability. Additionally, we design a mask-prediction baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music Technology and Sound Studies · Music and Audio Processing