Joint Optimization of Masks and Deep Recurrent Neural Networks for   Monaural Source Separation

Po-Sen Huang; Minje Kim; Mark Hasegawa-Johnson; Paris Smaragdis

arXiv:1502.04149·cs.SD·October 2, 2015

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

PDF

2 Repos

TL;DR

This paper presents a joint optimization approach combining masking functions and deep recurrent neural networks for monaural source separation, significantly improving performance across speech, singing voice, and denoising tasks.

Contribution

It introduces a novel joint training method with a masking layer and discriminative criterion, enhancing separation quality over traditional models.

Findings

01

Achieved 2.30--4.98 dB SDR gain over NMF in speech separation

02

Attained 2.30--2.48 dB GNSDR and 4.32--5.42 dB GSIR gains in singing voice separation

03

Outperformed NMF and DNN baselines in speech denoising

Abstract

Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.