Analyzing the impact of speaker localization errors on speech separation   for automatic speech recognition

Sunit Sivasankaran; Emmaneul Vincent; Dominique Fohr

arXiv:1910.11114·eess.AS·October 25, 2019·EUSIPCO

Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

Sunit Sivasankaran, Emmaneul Vincent, Dominique Fohr

PDF

1 Repo

TL;DR

This paper examines how errors in speaker localization affect speech separation and recognition accuracy in multispeaker environments, highlighting the importance of accurate localization for reducing word error rates.

Contribution

It introduces a pipeline combining delay-and-sum beamforming, neural network-based masking, and adaptive beamforming to analyze localization errors' impact on speech recognition.

Findings

01

Ground truth localization yields 29.4% WER

02

Estimated localization yields 42.4% WER

03

Higher SIR significantly reduces WER

Abstract

We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of $29.4$ % was achieved using the ground truth localization information and $42.4$ % using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunits/Reverberated_WSJ_2MIX
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.