DPT-FSNet: Dual-path Transformer Based Full-band and Sub-band Fusion   Network for Speech Enhancement

Feng Dang; Hangting Chen; Pengyuan Zhang

arXiv:2104.13002·cs.SD·January 26, 2022

DPT-FSNet: Dual-path Transformer Based Full-band and Sub-band Fusion Network for Speech Enhancement

Feng Dang, Hangting Chen, Pengyuan Zhang

PDF

Open Access

TL;DR

This paper introduces DPT-FSNet, a dual-path transformer model that effectively fuses full-band and sub-band information for speech enhancement, achieving superior results on standard datasets.

Contribution

The paper proposes a novel dual-path transformer architecture that fully explores full-band and sub-band fusion for speech enhancement, with improved interpretability and performance.

Findings

01

Outperforms state-of-the-art methods on Voice Bank + DEMAND dataset.

02

Achieves superior speech enhancement results on Interspeech 2020 DNS dataset.

03

Demonstrates the effectiveness of dual-path transformer for full-band and sub-band fusion.

Abstract

Sub-band models have achieved promising results due to their ability to model local patterns in the spectrogram. Some studies further improve the performance by fusing sub-band and full-band information. However, the structure for the full-band and sub-band fusion model was not fully explored. This paper proposes a dual-path transformer-based full-band and sub-band fusion network (DPT-FSNet) for speech enhancement in the frequency domain. The intra and inter parts of the dual-path transformer model sub-band and full-band information, respectively. The features utilized by our proposed method are more interpretable than those utilized by the time-domain dual-path transformer. We conducted experiments on the Voice Bank + DEMAND and Interspeech 2020 Deep Noise Suppression (DNS) datasets to evaluate the proposed method. Experimental results show that the proposed method outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing