Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for   Real-Time Full-Band Speech Enhancement

Guochen Yu; Andong Li; Wenzhe Liu; Chengshi Zheng; Yutian Wang; Hui; Wang

arXiv:2203.16033·cs.SD·June 16, 2022·1 cites

Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement

Guochen Yu, Andong Li, Wenzhe Liu, Chengshi Zheng, Yutian Wang, Hui, Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel coordinated sub-band fusion network for real-time full-band speech enhancement, effectively recovering low, middle, and high-frequency bands step-wise, leading to improved speech quality.

Contribution

It proposes a multi-stage dual-stream network with sub-band interaction modules for enhanced full-band speech enhancement, surpassing existing methods.

Findings

01

Outperforms state-of-the-art full-band baselines in speech quality.

02

Effective recovery of low, middle, and high-frequency bands.

03

Real-time processing capability demonstrated.

Abstract

Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuguochencuc/sf-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis