Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source   Separation

Daniel Stoller; Sebastian Ewert; Simon Dixon

arXiv:1806.03185·cs.SD·June 11, 2018·75 cites

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Daniel Stoller, Sebastian Ewert, Simon Dixon

PDF

Open Access 5 Repos

TL;DR

This paper introduces Wave-U-Net, a time-domain neural network for audio source separation that models phase information and achieves performance comparable to spectrogram-based methods, while addressing evaluation metric issues.

Contribution

The paper presents Wave-U-Net, a novel end-to-end time-domain architecture with multi-scale processing, architectural improvements, and a new evaluation reporting method for audio source separation.

Findings

01

Wave-U-Net performs comparably to state-of-the-art spectrogram-based models.

02

Architectural enhancements improve separation quality and reduce artifacts.

03

Reporting rank-based statistics mitigates outlier issues in SDR evaluation.

Abstract

Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, employing a long temporal input context on the sample level is difficult, but required for high quality separation results because of long-range temporal correlations. In this context, we propose the Wave-U-Net, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales. We introduce further architectural improvements, including an output layer that enforces source additivity, an upsampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net