Monaural Singing Voice Separation with Skip-Filtering Connections and   Recurrent Inference of Time-Frequency Mask

Stylianos Ioannis Mimilakis; Konstantinos Drossos; Jo\~ao F. Santos,; Gerald Schuller; Tuomas Virtanen; Yoshua Bengio

arXiv:1711.01437·cs.SD·February 14, 2018

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Stylianos Ioannis Mimilakis, Konstantinos Drossos, Jo\~ao F. Santos,, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

PDF

TL;DR

This paper presents a novel deep learning approach for monaural singing voice separation that learns source-dependent masks during training, eliminating the need for post-processing and achieving improved separation quality.

Contribution

It introduces a learnable, source-dependent masking method with recurrent inference, sparse transformation, and a learned denoising filter, advancing monaural singing voice separation techniques.

Findings

01

Increased SDR by 0.49 dB over previous methods

02

Enhanced SIR by 0.30 dB compared to state-of-the-art

03

Proposed method removes the need for post-processing steps

Abstract

Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a learnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step. We introduce a recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter. Obtained results show an increase of 0.49 dB for the signal to distortion ratio and 0.30 dB for the signal to interference ratio, compared to previous state-of-the-art approaches for monaural singing voice separation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.