Stream Attention for far-field multi-microphone ASR

Xiaofei Wang; Yonghong Yan; Hynek Hermansky

arXiv:1711.11141·cs.SD·December 1, 2017·2 cites

Stream Attention for far-field multi-microphone ASR

Xiaofei Wang, Yonghong Yan, Hynek Hermansky

PDF

Open Access

TL;DR

This paper introduces a stream attention framework that enhances far-field multi-microphone ASR by focusing on more reliable microphone streams, leading to significant WER improvements.

Contribution

It proposes a novel attention scheme predicting microphone reliability from phoneme posteriors to improve multi-microphone ASR performance.

Findings

01

Substantial WER reduction achieved.

02

Effective microphone reliability prediction.

03

Improved ASR accuracy in real recordings.

Abstract

A stream attention framework has been applied to the posterior probabilities of the deep neural network (DNN) to improve the far-field automatic speech recognition (ASR) performance in the multi-microphone configuration. The stream attention scheme has been realized through an attention vector, which is derived by predicting the ASR performance from the phoneme posterior distribution of individual microphone stream, focusing the recognizer's attention to more reliable microphones. Investigation on the various ASR performance measures has been carried out using the real recorded dataset. Experiments results show that the proposed framework has yielded substantial improvements in word error rate (WER).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies