A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks
Bahareh Tolooshams, Kazuhito Koishida

TL;DR
This paper introduces a stereo-aware training framework for speech enhancement using deep neural networks, focusing on preserving spatial audio cues while improving speech quality.
Contribution
It proposes a novel, model-independent loss function that enhances stereo image preservation in deep learning-based speech enhancement.
Findings
Improved stereo image preservation in enhanced speech.
Better subjective listening test scores.
Enhanced overall speech quality.
Abstract
Deep learning-based speech enhancement has shown unprecedented performance in recent years. The most popular mono speech enhancement frameworks are end-to-end networks mapping the noisy mixture into an estimate of the clean speech. With growing computational power and availability of multichannel microphone recordings, prior works have aimed to incorporate spatial statistics along with spectral information to boost up performance. Despite an improvement in enhancement performance of mono output, the spatial image preservation and subjective evaluations have not gained much attention in the literature. This paper proposes a novel stereo-aware framework for speech enhancement, i.e., a training loss for deep learning-based speech enhancement to preserve the spatial image while enhancing the stereo mixture. The proposed framework is model independent, hence it can be applied to any deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques
