Deep Speech Denoising with Vector Space Projections
Jeff Hetherly, Paul Gamble, Maria Barrios, Cory Stephenson, Karl Ni

TL;DR
This paper introduces a neural network-based algorithm that uses source-contrastive embedding spaces and dual objectives to effectively denoise speech from a single microphone, even in dynamic noise conditions.
Contribution
It presents a novel denoising method leveraging source-contrastive estimation and continuous inference masks, improving generalization and computational efficiency over prior techniques.
Findings
Achieves competitive denoising accuracy compared to state-of-the-art methods.
Operates effectively on unseen speakers and noise conditions.
Offers a computationally efficient alternative to traditional algorithms.
Abstract
We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from negative sampling techniques in natural language processing, while simultaneously obtaining a continuous inference mask. Our embedding space directly optimizes for the discrimination of speaker and noise by jointly modeling their characteristics. This space is generalizable in that it is not speaker or noise specific and is capable of denoising speech even if the model has not seen the speaker in the training set. Parameters are trained with dual objectives: one that promotes a selective bandpass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
