3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

Anurenjan Purushothaman; Anirudh Sreeram; Sriram Ganapathy

arXiv:1911.05504·eess.AS·January 28, 2020

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

PDF

Open Access

TL;DR

This paper introduces a novel 3-D feature extraction and acoustic modeling approach using multi variate autoregressive modeling and 3-D CNNs for far-field speech recognition, outperforming traditional beamforming methods.

Contribution

It proposes a direct multi-channel feature extraction method with 3-D CNN acoustic modeling, eliminating the need for beamforming enhancement in reverberant conditions.

Findings

01

Significant WER reduction on CHiME-3 dataset

02

Improved recognition accuracy on REVERB Challenge dataset

03

Outperforms traditional beamforming-based systems

Abstract

Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which is used to extract spectrogram based features for a neural network acoustic model. In this paper, we propose to extract features directly from the multi-channel speech signal using a multi variate autoregressive (MAR) modeling approach, where the correlations among all the three dimensions of time, frequency and channel are exploited. The MAR features are fed to a convolutional neural network (CNN) architecture which performs the joint acoustic modeling on the three dimensions. The 3-D CNN architecture allows the combination of multi-channel features that optimize the speech recognition cost compared to the traditional beamforming models that focus on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing