Binaural Speech Enhancement Using STOI-Optimal Masks
Vikas Tokala, Mike Brookes, Patrick A. Naylor

TL;DR
This paper extends STOI-optimal masking techniques to binaural speech enhancement, effectively improving SNR while preserving spatial cues and speech intelligibility in noisy environments.
Contribution
It introduces a novel binaural masking approach that estimates masks for each channel and combines them to enhance speech without losing spatial information.
Findings
Improved SNR in binaural speech signals.
Preservation of binaural cues and speech intelligibility.
Effective in directional noise conditions.
Abstract
STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which spatial information is known to be important to speech understanding and therefore should be preserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a `better-ear listening' mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in each time-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the proposed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies
