Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency   Domain Features and a Pre-trained Acoustic Model

Quandong Wang; Junnan Wu; Zhao Yan; Sichong Qian; Liyong Guo; Lichun; Fan; Weiji Zhuang; Peng Gao; Yujun Wang

arXiv:2107.11222·cs.SD·September 27, 2021

Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

Quandong Wang, Junnan Wu, Zhao Yan, Sichong Qian, Liyong Guo, Lichun, Fan, Weiji Zhuang, Peng Gao, Yujun Wang

PDF

Open Access

TL;DR

This paper introduces a novel multi-channel speech enhancement method that combines time-frequency features with a pre-trained acoustic model, achieving significant improvements in speech quality metrics.

Contribution

It presents a new two-stage feature fusion approach and integrates a pre-trained acoustic model within a multi-task learning framework for enhanced speech quality.

Findings

01

PESQ improved by 0.24 over baseline

02

Effective multi-channel feature fusion demonstrated

03

Pre-trained acoustic model constrains distortion

Abstract

We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are computed and then integrated with the first 2-D convolutional layer, while in the frequency domain, the log-power spectra (LPS) features from both original channels and super-directive beamforming outputs are combined with a second 2-D convolutional layer. To fully integrate the rich information of multi-channel speech, i.e. time-frequency domain features and the array geometry, we apply a third 2-D convolutional layer in the second fusion stage to obtain the final convolutional features. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing