A Training Framework for Stereo-Aware Speech Enhancement using Deep   Neural Networks

Bahareh Tolooshams; Kazuhito Koishida

arXiv:2112.04939·eess.AS·February 2, 2022

A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

Bahareh Tolooshams, Kazuhito Koishida

PDF

Open Access

TL;DR

This paper introduces a stereo-aware training framework for speech enhancement using deep neural networks, focusing on preserving spatial audio cues while improving speech quality.

Contribution

It proposes a novel, model-independent loss function that enhances stereo image preservation in deep learning-based speech enhancement.

Findings

01

Improved stereo image preservation in enhanced speech.

02

Better subjective listening test scores.

03

Enhanced overall speech quality.

Abstract

Deep learning-based speech enhancement has shown unprecedented performance in recent years. The most popular mono speech enhancement frameworks are end-to-end networks mapping the noisy mixture into an estimate of the clean speech. With growing computational power and availability of multichannel microphone recordings, prior works have aimed to incorporate spatial statistics along with spectral information to boost up performance. Despite an improvement in enhancement performance of mono output, the spatial image preservation and subjective evaluations have not gained much attention in the literature. This paper proposes a novel stereo-aware framework for speech enhancement, i.e., a training loss for deep learning-based speech enhancement to preserve the spatial image while enhancing the stereo mixture. The proposed framework is model independent, hence it can be applied to any deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques