Unsupervised Neural Mask Estimator For Generalized Eigen-Value   Beamforming Based ASR

Rohit Kumar; Anirudh Sreeram; Anurenjan Purushothaman; Sriram; Ganapathy

arXiv:1911.12617·eess.AS·December 2, 2019

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

Rohit Kumar, Anirudh Sreeram, Anurenjan Purushothaman, Sriram, Ganapathy

PDF

TL;DR

This paper introduces an unsupervised neural mask estimator for generalized eigen-value beamforming in ASR, enabling training on real noisy recordings without needing clean reference data, and achieves competitive results.

Contribution

It proposes a novel unsupervised training method for neural mask estimation in beamforming, removing the dependency on clean speech data for training.

Findings

01

Significantly better ASR performance than out-of-domain teacher models.

02

Comparable results to oracle mask estimators trained on in-domain data.

03

Effective in noisy and reverberant environments.

Abstract

The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In this way, the model training can also be carried out on real recordings of noisy speech rather than simulated ones alone done in a typical teacher model. Several experiments performed on noisy and reverberant environments in the CHiME-3 corpus as well as the REVERB challenge corpus highlight the effectiveness of the proposed approach. The ASR results for the proposed approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.