Deep Speech Denoising with Vector Space Projections

Jeff Hetherly; Paul Gamble; Maria Barrios; Cory Stephenson; Karl Ni

arXiv:1804.10669·cs.SD·May 1, 2018

Deep Speech Denoising with Vector Space Projections

Jeff Hetherly, Paul Gamble, Maria Barrios, Cory Stephenson, Karl Ni

PDF

Open Access

TL;DR

This paper introduces a neural network-based algorithm that uses source-contrastive embedding spaces and dual objectives to effectively denoise speech from a single microphone, even in dynamic noise conditions.

Contribution

It presents a novel denoising method leveraging source-contrastive estimation and continuous inference masks, improving generalization and computational efficiency over prior techniques.

Findings

01

Achieves competitive denoising accuracy compared to state-of-the-art methods.

02

Operates effectively on unseen speakers and noise conditions.

03

Offers a computationally efficient alternative to traditional algorithms.

Abstract

We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from negative sampling techniques in natural language processing, while simultaneously obtaining a continuous inference mask. Our embedding space directly optimizes for the discrimination of speaker and noise by jointly modeling their characteristics. This space is generalizable in that it is not speaker or noise specific and is capable of denoising speech even if the model has not seen the speaker in the training set. Parameters are trained with dual objectives: one that promotes a selective bandpass…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis