The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional   recurrent Network for Multi Channel Speech Enhancement and Speech Recognition

Jingdong Li; Yuanyuan Zhu; Dawei Luo; Yun Liu; Guohui Cui; Zhaoxia Li

arXiv:2202.10017·eess.AS·February 22, 2022

The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition

Jingdong Li, Yuanyuan Zhu, Dawei Luo, Yun Liu, Guohui Cui, Zhaoxia Li

PDF

Open Access

TL;DR

This paper presents a two-stage multi-channel speech enhancement system using MIMO and MISO convolutional recurrent networks, achieving top performance in the L3DAS22 challenge for reverberant office environments.

Contribution

The paper introduces a novel two-stage framework combining MIMO and MISO networks for improved multi-channel speech denoising and dereverberation in challenging environments.

Findings

01

Ranked 3rd in L3DAS22 challenge

02

Achieved 3.2% WER on test set

03

Attained 0.972 STOI score

Abstract

This paper described the PCG-AIID system for L3DAS22 challenge in Task 1: 3D speech enhancement in office reverberant environment. We proposed a two-stage framework to address multi-channel speech denoising and dereverberation. In the first stage, a multiple input and multiple output (MIMO) network is applied to remove background noise while maintaining the spatial characteristics of multi-channel signals. In the second stage, a multiple input and single output (MISO) network is applied to enhance the speech from desired direction and post-filtering. As a result, our system ranked 3rd place in ICASSP2022 L3DAS22 challenge and significantly outperforms the baseline system, while achieving 3.2% WER and 0.972 STOI on the blind test-set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques