DBNet: A Dual-branch Network Architecture Processing on Spectrum and   Waveform for Single-channel Speech Enhancement

Kanghao Zhang; Shulin He; Hao Li; Xueliang Zhang

arXiv:2105.02436·cs.SD·May 7, 2021·1 cites

DBNet: A Dual-branch Network Architecture Processing on Spectrum and Waveform for Single-channel Speech Enhancement

Kanghao Zhang, Shulin He, Hao Li, Xueliang Zhang

PDF

Open Access

TL;DR

DBNet is a real-time dual-branch neural network that processes spectrum and waveform data separately with interconnections, significantly improving speech enhancement quality in noisy environments.

Contribution

Introduces a novel dual-branch architecture with interconnections for spectrum and waveform modeling in speech enhancement.

Findings

01

Outperforms existing algorithms in challenging environments.

02

Ranks top 8 in INTERSPEECH 2021 DNS challenge real-time track.

03

Achieves high MOS scores in subjective evaluations.

Abstract

In real acoustic environment, speech enhancement is an arduous task to improve the quality and intelligibility of speech interfered by background noise and reverberation. Over the past years, deep learning has shown great potential on speech enhancement. In this paper, we propose a novel real-time framework called DBNet which is a dual-branch structure with alternate interconnection. Each branch incorporates an encoder-decoder architecture with skip connections. The two branches are responsible for spectrum and waveform modeling, respectively. A bridge layer is adopted to exchange information between the two branches. Systematic evaluation and comparison show that the proposed system substantially outperforms related algorithms under very challenging environments. And in INTERSPEECH 2021 Deep Noise Suppression (DNS) challenge, the proposed system ranks the top 8 in real-time track 1 in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis