Audio Source Separation via Multi-Scale Learning with Dilated Dense   U-Nets

Vivek Sivaraman Narayanaswamy; Sameeksha Katoch; Jayaraman J.; Thiagarajan; Huan Song; Andreas Spanias

arXiv:1904.04161·cs.LG·April 9, 2019·5 cites

Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets

Vivek Sivaraman Narayanaswamy, Sameeksha Katoch, Jayaraman J., Thiagarajan, Huan Song, Andreas Spanias

PDF

Open Access

TL;DR

This paper enhances audio source separation by integrating dilated convolutions and dense connections into U-Net architectures, improving multi-scale feature extraction and temporal modeling for better separation performance.

Contribution

It introduces adaptive dilated convolutions and dense connections into U-Net models, optimizing multi-scale feature extraction for audio source separation.

Findings

01

Improved separation performance on MUSDB dataset.

02

Dilated convolutions increase temporal receptive fields effectively.

03

Dense connections enhance feature reuse and gradient flow.

Abstract

Modern audio source separation techniques rely on optimizing sequence model architectures such as, 1D-CNNs, on mixture recordings to generalize well to unseen mixtures. Specifically, recent focus is on time-domain based architectures such as Wave-U-Net which exploit temporal context by extracting multi-scale features. However, the optimality of the feature extraction process in these architectures has not been well investigated. In this paper, we examine and recommend critical architectural changes that forge an optimal multi-scale feature extraction process. To this end, we replace regular $1 -$ D convolutions with adaptive dilated convolutions that have innate capability of capturing increased context by using large temporal receptive fields. We also investigate the impact of dense connections on the extraction process that encourage feature reuse and better gradient flow. The dense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net · Dense Connections