Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction
Zhong-Qiu Wang, Shinji Watanabe

TL;DR
This paper introduces an overlapped-frame prediction method for real-time neural speech enhancement, enabling better use of future context and improving performance in noisy-reverberant conditions.
Contribution
It proposes a novel overlapped-frame prediction technique and a scale-aware loss function to enhance frame-online speech enhancement models.
Findings
Improved speech enhancement performance in noisy-reverberant environments.
Effective utilization of future contextual information.
Enhanced model accuracy with the proposed loss function.
Abstract
Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of overlap-add in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, this information is only partially leveraged by current frame-online systems. To fully exploit it, we propose an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a loss function to account for the scale difference between predicted and oracle target signals. Experiments on a noisy-reverberant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
