Multichannel Singing Voice Separation by Deep Neural Network Informed   DOA Constrained CNMF

Antonio J. Mu\~noz-Montoro; Julio J. Carabias-Orti; Archontis Politis,; Konstantinos Drossos

arXiv:2003.01162·eess.AS·March 4, 2020·1 cites

Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CNMF

Antonio J. Mu\~noz-Montoro, Julio J. Carabias-Orti, Archontis Politis,, Konstantinos Drossos

PDF

Open Access

TL;DR

This paper introduces a multichannel singing voice separation method combining deep learning for spectral inference with a spatial covariance model based on CNMF, demonstrating superior performance over existing methods.

Contribution

The paper presents a novel joint framework integrating deep neural network-based spectral inference with CNMF for multichannel singing voice separation.

Findings

01

The joint DL+CNMF method outperforms individual DL and CNMF baselines.

02

The approach effectively models long-term temporal patterns of musical sources.

03

Experimental results validate the superiority of the proposed method on a large dataset.

Abstract

This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on Complex Non-Negative Matrix Factorization (CNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CNMF method outperforms both the individual monophonic DL-based separation and the multichannel CNMF baseline methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis