Improving Speech Enhancement by Integrating Inter-Channel and Band   Features with Dual-branch Conformer

Jizhen Li; Xinmeng Xu; Weiping Tu; Yuhong Yang; Rong Zhu

arXiv:2407.06524·cs.SD·July 16, 2024

Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer

Jizhen Li, Xinmeng Xu, Weiping Tu, Yuhong Yang, Rong Zhu

PDF

Open Access

TL;DR

This paper introduces a dual-branch conformer model that enhances speech by capturing inter-channel and band features, effectively exploiting channel correlations for improved performance in speech enhancement tasks.

Contribution

The paper proposes a novel channel-aware dual-branch conformer architecture that models inter-channel and band features to better capture long-range correlations in speech spectrograms.

Findings

01

The proposed model outperforms recent methods on DNS-Challenge 2020 dataset.

02

Channel feature leveraging significantly improves speech enhancement.

03

The model achieves superior performance with attractive computational costs.

Abstract

Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with different scales demonstrating strong correlations. To fill this gap, we propose a novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer), which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information. Ablation studies conducted on DNS-Challenge 2020 dataset demonstrate the importance of channel feature leveraging while showing the significance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development