Spectral Masking with Explicit Time-Context Windowing for Neural   Network-Based Monaural Speech Enhancement

Luan Vin\'icius Fiorio; Boris Karanov; Bruno Defraene; Johan David,; Wim van Houtum; Frans Widdershoven; Ronald M. Aarts

arXiv:2408.15582·eess.AS·August 29, 2024

Spectral Masking with Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement

Luan Vin\'icius Fiorio, Boris Karanov, Bruno Defraene, Johan David,, Wim van Houtum, Frans Widdershoven, Ronald M. Aarts

PDF

Open Access

TL;DR

This paper introduces a simple yet effective method for neural speech enhancement that uses explicit time-context windowing to improve spectral masking, boosting speech intelligibility and quality with minimal additional parameters.

Contribution

It presents a novel post-processing approach applying time-context windowing at inference to enhance spectral mask estimation without altering neural network training.

Findings

01

Improves speech intelligibility and quality in denoising tasks.

02

Requires less than 1% increase in model parameters.

03

Effective across different convolutional speech enhancement models.

Abstract

We propose and analyze the use of an explicit time-context window for neural network-based spectral masking speech enhancement to leverage signal context dependencies between neighboring frames. In particular, we concentrate on soft masking and loss computed on the time-frequency representation of the reconstructed speech. We show that the application of a time-context windowing function at both input and output of the neural network model improves the soft mask estimation process by combining multiple estimates taken from different contexts. The proposed approach is only applied as post-optimization in inference mode, not requiring additional layers or special training for the neural network model. Our results show that the method consistently increases both intelligibility and signal quality of the denoised speech, as demonstrated for two classes of convolutional-based speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research