Generalization Challenges for Neural Architectures in Audio Source   Separation

Shariq Mobin; Brian Cheung; Bruno Olshausen

arXiv:1803.08629·cs.SD·May 29, 2018

Generalization Challenges for Neural Architectures in Audio Source Separation

Shariq Mobin, Brian Cheung, Bruno Olshausen

PDF

Open Access 1 Repo

TL;DR

This paper compares recurrent and convolutional neural networks for audio source separation, demonstrating that convolutional models achieve state-of-the-art results with fewer parameters and better generalization to new environments.

Contribution

It introduces a convolutional neural network approach for source separation, outperforming recurrent models in efficiency and robustness, and presents a new dataset for real-world testing.

Findings

01

Convolutional models achieve state-of-the-art separation performance.

02

Convolutional models generalize better to unseen environments.

03

Environmental acoustics significantly affect model performance.

Abstract

Recent work has shown that recurrent neural networks can be trained to separate individual speakers in a sound mixture with high fidelity. Here we explore convolutional neural network models as an alternative and show that they achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize and compare the robustness and ability of these different approaches to generalize under three different test conditions: longer time sequences, the addition of intermittent noise, and different datasets not seen during training. For the last condition, we create a new dataset, RealTalkLibri, to test source separation in real-world environments. We show that the acoustics of the environment have significant impact on the structure of the waveform and the overall performance of neural network models, with the convolutional model showing superior ability to generalize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShariqM/source_separation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis