Multi-Task Audio Source Separation
Lu Zhang, Chenxing Li, Feng Deng, and Xiaorui Wang

TL;DR
This paper introduces a new multi-task audio source separation challenge, proposes a complex domain model with residual compensation, and demonstrates its superior performance in separating speech, music, and noise from monaural mixtures.
Contribution
It presents a novel multi-task separation framework, a new dataset, and shows improved results over existing models in separating multiple audio sources.
Findings
The complex ratio mask is effective for multi-task separation.
Residual signal compensation improves separation quality.
The proposed model outperforms several well-known separation models.
Abstract
The audio source separation tasks, such as speech enhancement, speech separation, and music source separation, have achieved impressive performance in recent studies. The powerful modeling capabilities of deep neural networks give us hope for more challenging tasks. This paper launches a new multi-task audio source separation (MTASS) challenge to separate the speech, music, and noise signals from the monaural mixture. First, we introduce the details of this task and generate a dataset of mixtures containing speech, music, and background noises. Then, we propose an MTASS model in the complex domain to fully utilize the differences in spectral characteristics of the three audio signals. In detail, the proposed model follows a two-stage pipeline, which separates the three types of audio signals and then performs signal compensation separately. After comparing different training targets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
