Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer

Chengshi Zheng; Wenzhe Liu; Andong Li; Yuxuan Ke; and Xiaodong Li

arXiv:2202.06764·eess.AS·June 1, 2022

Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer

Chengshi Zheng, Wenzhe Liu, Andong Li, Yuxuan Ke, and Xiaodong Li

PDF

TL;DR

This paper introduces a low-latency deep filter-bank equalizer framework for monaural speech enhancement, achieving high performance with only 4 ms latency by integrating deep learning models for noise reduction and adaptive filtering.

Contribution

It proposes a novel deep learning-based framework that shortens digital filters to enable low-latency speech enhancement without overlap-add, outperforming traditional methods.

Findings

01

Achieved superior PESQ, STOI, and noise reduction metrics at 4 ms latency.

02

Demonstrated effectiveness on WSJ0-SI84 corpus.

03

Outperformed traditional low-latency speech enhancement algorithms.

Abstract

It is highly desirable that speech enhancement algorithms can achieve good performance while keeping low latency for many applications, such as digital hearing aids, acoustically transparent hearing devices, and public address systems. To improve the performance of traditional low-latency speech enhancement algorithms, a deep filter-bank equalizer (FBE) framework was proposed, which integrated a deep learning-based subband noise reduction network with a deep learning-based shortened digital filter mapping network. In the first network, a deep learning model was trained with a controllable small frame shift to satisfy the low-latency demand, i.e., $\leq$ 4 ms, so as to obtain (complex) subband gains, which could be regarded as an adaptive digital filter in each frame. In the second network, to reduce the latency, this adaptive digital filter was implicitly shortened by a deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.