THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement
Feng Dang, Qi Hu, Pengyuan Zhang

TL;DR
This paper introduces THLNet, a two-stage lightweight neural network for monaural speech enhancement that combines a novel learnable filter bank with a heterogeneous structure to improve performance while maintaining efficiency.
Contribution
The paper presents a novel two-stage framework with a learnable complex-valued filter bank and heterogeneous subnetworks, advancing speech enhancement with improved accuracy and efficiency.
Findings
Outperforms state-of-the-art methods on VoiceBank + DEMAND and DNS datasets.
Maintains small model size and low computational complexity.
Effective two-stage approach with specialized subnetworks enhances speech quality.
Abstract
In this paper, we propose a two-stage heterogeneous lightweight network for monaural speech enhancement. Specifically, we design a novel two-stage framework consisting of a coarse-grained full-band mask estimation stage and a fine-grained low-frequency refinement stage. Instead of using a hand-designed real-valued filter, we use a novel learnable complex-valued rectangular bandwidth (LCRB) filter bank as an extractor of compact features. Furthermore, considering the respective characteristics of the proposed two-stage task, we used a heterogeneous structure, i.e., a U-shaped subnetwork as the backbone of CoarseNet and a single-scale subnetwork as the backbone of FineNet. We conducted experiments on the VoiceBank + DEMAND and DNS datasets to evaluate the proposed approach. The experimental results show that the proposed method outperforms the current state-of-the-art methods, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis
