THLNet: two-stage heterogeneous lightweight network for monaural speech   enhancement

Feng Dang; Qi Hu; Pengyuan Zhang

arXiv:2301.07939·cs.SD·May 22, 2023·1 cites

THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement

Feng Dang, Qi Hu, Pengyuan Zhang

PDF

Open Access

TL;DR

This paper introduces THLNet, a two-stage lightweight neural network for monaural speech enhancement that combines a novel learnable filter bank with a heterogeneous structure to improve performance while maintaining efficiency.

Contribution

The paper presents a novel two-stage framework with a learnable complex-valued filter bank and heterogeneous subnetworks, advancing speech enhancement with improved accuracy and efficiency.

Findings

01

Outperforms state-of-the-art methods on VoiceBank + DEMAND and DNS datasets.

02

Maintains small model size and low computational complexity.

03

Effective two-stage approach with specialized subnetworks enhances speech quality.

Abstract

In this paper, we propose a two-stage heterogeneous lightweight network for monaural speech enhancement. Specifically, we design a novel two-stage framework consisting of a coarse-grained full-band mask estimation stage and a fine-grained low-frequency refinement stage. Instead of using a hand-designed real-valued filter, we use a novel learnable complex-valued rectangular bandwidth (LCRB) filter bank as an extractor of compact features. Furthermore, considering the respective characteristics of the proposed two-stage task, we used a heterogeneous structure, i.e., a U-shaped subnetwork as the backbone of CoarseNet and a single-scale subnetwork as the backbone of FineNet. We conducted experiments on the VoiceBank + DEMAND and DNS datasets to evaluate the proposed approach. The experimental results show that the proposed method outperforms the current state-of-the-art methods, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis