Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model
Quandong Wang, Junnan Wu, Zhao Yan, Sichong Qian, Liyong Guo, Lichun, Fan, Weiji Zhuang, Peng Gao, Yujun Wang

TL;DR
This paper introduces a novel multi-channel speech enhancement method that combines time-frequency features with a pre-trained acoustic model, achieving significant improvements in speech quality metrics.
Contribution
It presents a new two-stage feature fusion approach and integrates a pre-trained acoustic model within a multi-task learning framework for enhanced speech quality.
Findings
PESQ improved by 0.24 over baseline
Effective multi-channel feature fusion demonstrated
Pre-trained acoustic model constrains distortion
Abstract
We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are computed and then integrated with the first 2-D convolutional layer, while in the frequency domain, the log-power spectra (LPS) features from both original channels and super-directive beamforming outputs are combined with a second 2-D convolutional layer. To fully integrate the rich information of multi-channel speech, i.e. time-frequency domain features and the array geometry, we apply a third 2-D convolutional layer in the second fusion stage to obtain the final convolutional features. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
