Singing Voice Separation Using a Deep Convolutional Neural Network   Trained by Ideal Binary Mask and Cross Entropy

Kin Wah Edward Lin; Balamurali B.T.; Enyan Koh; Simon Lui; Dorien; Herremans

arXiv:1812.01278·cs.SD·December 5, 2018·1 cites

Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy

Kin Wah Edward Lin, Balamurali B.T., Enyan Koh, Simon Lui, Dorien, Herremans

PDF

Open Access 2 Repos

TL;DR

This paper introduces a deep convolutional neural network that treats singing voice separation as a pixel-wise classification problem, using ideal binary masks and cross entropy loss to improve accuracy and eliminate complex postprocessing.

Contribution

The novel approach applies pixel-wise image classification techniques to singing voice separation, outperforming previous methods and simplifying the processing pipeline.

Findings

01

Outperforms MIREX 2016 and 2014 winners in GNSDR metrics.

02

Achieves competitive results on DSD100 dataset with advanced systems.

03

Eliminates the need for Wiener filter postprocessing.

Abstract

Separating a singing voice from its music accompaniment remains an important challenge in the field of music information retrieval. We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms. The pixel-wise classification technique directly estimates the sound source label for each time-frequency (T-F) bin in our spectrogram image, thus eliminating common pre- and postprocessing tasks. The proposed network is trained by using the Ideal Binary Mask (IBM) as the target output label. The IBM identifies the dominant sound source in each T-F bin of the magnitude spectrogram of a mixture signal, by considering each T-F bin as a pixel with a multi-label (for each sound source). Cross entropy is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsSolana Customer Service Number +1-833-534-1729