Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets
Vivek Sivaraman Narayanaswamy, Sameeksha Katoch, Jayaraman J., Thiagarajan, Huan Song, Andreas Spanias

TL;DR
This paper enhances audio source separation by integrating dilated convolutions and dense connections into U-Net architectures, improving multi-scale feature extraction and temporal modeling for better separation performance.
Contribution
It introduces adaptive dilated convolutions and dense connections into U-Net models, optimizing multi-scale feature extraction for audio source separation.
Findings
Improved separation performance on MUSDB dataset.
Dilated convolutions increase temporal receptive fields effectively.
Dense connections enhance feature reuse and gradient flow.
Abstract
Modern audio source separation techniques rely on optimizing sequence model architectures such as, 1D-CNNs, on mixture recordings to generalize well to unseen mixtures. Specifically, recent focus is on time-domain based architectures such as Wave-U-Net which exploit temporal context by extracting multi-scale features. However, the optimality of the feature extraction process in these architectures has not been well investigated. In this paper, we examine and recommend critical architectural changes that forge an optimal multi-scale feature extraction process. To this end, we replace regular D convolutions with adaptive dilated convolutions that have innate capability of capturing increased context by using large temporal receptive fields. We also investigate the impact of dense connections on the extraction process that encourage feature reuse and better gradient flow. The dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net · Dense Connections
