Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer
Chengshi Zheng, Wenzhe Liu, Andong Li, Yuxuan Ke, and Xiaodong Li

TL;DR
This paper introduces a low-latency deep filter-bank equalizer framework for monaural speech enhancement, achieving high performance with only 4 ms latency by integrating deep learning models for noise reduction and adaptive filtering.
Contribution
It proposes a novel deep learning-based framework that shortens digital filters to enable low-latency speech enhancement without overlap-add, outperforming traditional methods.
Findings
Achieved superior PESQ, STOI, and noise reduction metrics at 4 ms latency.
Demonstrated effectiveness on WSJ0-SI84 corpus.
Outperformed traditional low-latency speech enhancement algorithms.
Abstract
It is highly desirable that speech enhancement algorithms can achieve good performance while keeping low latency for many applications, such as digital hearing aids, acoustically transparent hearing devices, and public address systems. To improve the performance of traditional low-latency speech enhancement algorithms, a deep filter-bank equalizer (FBE) framework was proposed, which integrated a deep learning-based subband noise reduction network with a deep learning-based shortened digital filter mapping network. In the first network, a deep learning model was trained with a controllable small frame shift to satisfy the low-latency demand, i.e., 4 ms, so as to obtain (complex) subband gains, which could be regarded as an adaptive digital filter in each frame. In the second network, to reduce the latency, this adaptive digital filter was implicitly shortened by a deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
