Multi-Talker MVDR Beamforming Based on Extended Complex Gaussian Mixture   Model

Hangting Chen; Pengyuan Zhang; Yonghong Yan

arXiv:1910.07753·eess.AS·October 18, 2019·1 cites

Multi-Talker MVDR Beamforming Based on Extended Complex Gaussian Mixture Model

Hangting Chen, Pengyuan Zhang, Yonghong Yan

PDF

Open Access

TL;DR

This paper introduces a novel multi-talker MVDR beamforming method using an extended complex Gaussian mixture model, significantly improving speech recognition accuracy in noisy, overlapping multi-talker scenarios.

Contribution

The paper proposes extending the Gaussian mixture model and integrating mixture coefficients to enhance multi-talker beamforming for speech recognition.

Findings

01

Achieved a 13.87% absolute WER reduction on CHiME-5 dataset.

02

Effectively separates overlapping speakers in noisy environments.

03

Improves noise reduction and target speaker extraction.

Abstract

In this letter, we present a novel multi-talker minimum variance distortionless response (MVDR) beamforming as the front-end of an automatic speech recognition (ASR) system in a dinner party scenario. The CHiME-5 dataset is selected to evaluate our proposal for overlapping multi-talker scenario with severe noise. A detailed study on beamforming is conducted based on the proposed extended complex Gaussian mixture model (CGMM) integrated with various speech separation and speech enhancement masks. Three main changes are made to adopt the original CGMM-based MVDR for the multi-talker scenario. First, the number of Gaussian distributions is extended to 3 with an additional inference speaker model. Second, the mixture coefficients are introduced as a supervisor to generate more elaborate masks and avoid the permutation problems. Moreover, we reorganize the MVDR and mask-based speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques