Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy
Kin Wah Edward Lin, Balamurali B.T., Enyan Koh, Simon Lui, Dorien, Herremans

TL;DR
This paper introduces a deep convolutional neural network that treats singing voice separation as a pixel-wise classification problem, using ideal binary masks and cross entropy loss to improve accuracy and eliminate complex postprocessing.
Contribution
The novel approach applies pixel-wise image classification techniques to singing voice separation, outperforming previous methods and simplifying the processing pipeline.
Findings
Outperforms MIREX 2016 and 2014 winners in GNSDR metrics.
Achieves competitive results on DSD100 dataset with advanced systems.
Eliminates the need for Wiener filter postprocessing.
Abstract
Separating a singing voice from its music accompaniment remains an important challenge in the field of music information retrieval. We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms. The pixel-wise classification technique directly estimates the sound source label for each time-frequency (T-F) bin in our spectrogram image, thus eliminating common pre- and postprocessing tasks. The proposed network is trained by using the Ideal Binary Mask (IBM) as the target output label. The IBM identifies the dominant sound source in each T-F bin of the magnitude spectrogram of a mixture signal, by considering each T-F bin as a pixel with a multi-label (for each sound source). Cross entropy is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsSolana Customer Service Number +1-833-534-1729
