Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation
Woosung Choi, Minseok Kim, Jaehwa Chung, Daewon Lee and, Soonyoung Jung

TL;DR
This paper evaluates various intermediate blocks in U-Net architectures for singing voice separation, demonstrating that certain configurations significantly improve SDR performance on the MUSDB dataset.
Contribution
It introduces and compares multiple intermediate spectrogram transformation blocks within U-Nets for SVS, achieving state-of-the-art results with a specific block type.
Findings
A particular convolutional and fully-connected block achieves 0.9 dB SDR improvement.
U-Net variants with different blocks are systematically compared.
The best model sets a new SDR benchmark on MUSDB for SVS.
Abstract
Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Concatenated Skip Connection · U-Net
