Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame   Prediction

Zhong-Qiu Wang; Shinji Watanabe

arXiv:2204.07566·cs.SD·July 13, 2022

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Zhong-Qiu Wang, Shinji Watanabe

PDF

TL;DR

This paper introduces an overlapped-frame prediction method for real-time neural speech enhancement, enabling better use of future context and improving performance in noisy-reverberant conditions.

Contribution

It proposes a novel overlapped-frame prediction technique and a scale-aware loss function to enhance frame-online speech enhancement models.

Findings

01

Improved speech enhancement performance in noisy-reverberant environments.

02

Effective utilization of future contextual information.

03

Enhanced model accuracy with the proposed loss function.

Abstract

Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of overlap-add in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, this information is only partially leveraged by current frame-online systems. To fully exploit it, we propose an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a loss function to account for the scale difference between predicted and oracle target signals. Experiments on a noisy-reverberant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.