Danna-Sep: Unite to separate them all
Chin-Yun Yu, Kin-Wai Cheuk

TL;DR
Danna-Sep is a novel music source separation framework that combines spectrogram and waveform models, outperforming state-of-the-art methods by leveraging their complementary strengths.
Contribution
It introduces a simple yet effective fusion of spectrogram and waveform models for improved music source separation performance.
Findings
Danna-Sep surpasses SoTA models in Source-to-Distortion Ratio
Combining models yields better separation for harmonic and percussive sources
Simple linear combination achieves significant performance gains
Abstract
Deep learning-based music source separation has gained a lot of interest in the last decades. Most of the existing methods operate with either spectrograms or waveforms. Spectrogram based models learn suitable masks for separating magnitude spectrogram into different sources, and waveform-based models directly generate waveforms of individual sources. The two types of models have complementary strengths; the former is superior given harmonic sources such as vocals, while the latter demonstrates better results for percussion and bass instruments. In this work, we improved upon the state-of-the-art (SoTA) models and successfully combined the best of both worlds. The backbones of the proposed framework, dubbed Danna-Sep, are two spectrogram-based models including a modified X-UMX and U-Net, and an enhanced Demucs as the waveform-based model. Given an input of mixture, we linearly combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques
MethodsConvolution · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · U-Net
