A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
Shenghui Lu, Hukai Huang, Jinanglong Yao, Kaidi Wang, Qingyang Hong, Lin Li

TL;DR
This paper introduces a hierarchical deep filtering framework that enhances real-time speech quality by integrating sub-band processing and decoupled deep filtering, leading to superior performance with reduced complexity.
Contribution
The paper presents a novel two-stage hierarchical deep filtering model with a new TAConv module, improving speech enhancement by better exploiting time-frequency information and reducing computational complexity.
Findings
Outperforms existing systems in speech enhancement quality.
Uses fewer resources while maintaining high performance.
Effectively exploits surrounding time-frequency bin information.
Abstract
This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its surrounding TF bins. To further improve the model performance, we decouple deep filtering into temporal and frequency components and introduce a two-stage framework, reducing the complexity of filter coefficient prediction at each stage. Additionally, we propose the TAConv module to strengthen convolutional feature extraction. Experimental results demonstrate that the proposed hierarchical deep filtering network (HDF-Net) effectively utilizes surrounding TF bin information and outperforms other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
