Task-Aware Unified Source Separation

Kohei Saijo; Janek Ebbers; Fran\c{c}ois G. Germain; Gordon Wichern,; Jonathan Le Roux

arXiv:2410.23987·eess.AS·November 1, 2024

Task-Aware Unified Source Separation

Kohei Saijo, Janek Ebbers, Fran\c{c}ois G. Germain, Gordon Wichern,, Jonathan Le Roux

PDF

Open Access

TL;DR

The paper introduces TUSS, a task-aware unified source separation model that uses learnable prompts to adapt to various separation tasks, including contradictory ones, demonstrating flexibility and effectiveness across five major tasks.

Contribution

It proposes a novel prompt-based approach enabling a single model to handle multiple, even contradictory, source separation tasks with high flexibility.

Findings

01

Successfully handles five major separation tasks

02

Demonstrates flexible behavior based on prompts

03

Effective on both synthetic and real recordings

Abstract

Several attempts have been made to handle multiple source separation tasks such as speech enhancement, speech separation, sound event separation, music source separation (MSS), or cinematic audio source separation (CASS) with a single model. These models are trained on large-scale data including speech, instruments, or sound events and can often successfully separate a wide range of sources. However, it is still challenging for such models to cover all separation tasks because some of them are contradictory (e.g., musical instruments are separated in MSS while they have to be grouped in CASS). To overcome this issue and support all the major separation tasks, we propose a task-aware unified source separation (TUSS) model. The model uses a variable number of learnable prompts to specify which source to separate, and changes its behavior depending on the given prompts, enabling it to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis