Contaminated speech training methods for robust DNN-HMM distant speech   recognition

Mirco Ravanelli; Maurizio Omologo

arXiv:1710.03538·eess.AS·October 11, 2017

Contaminated speech training methods for robust DNN-HMM distant speech recognition

Mirco Ravanelli, Maurizio Omologo

PDF

1 Repo

TL;DR

This paper revises contaminated speech training methods for robust distant speech recognition using DNN-HMM systems, proposing three novel techniques that significantly improve error rates in adverse acoustic conditions.

Contribution

It introduces asymmetric context windowing, close-talk supervision, and pre-training methods to enhance contaminated speech training for DNN-HMM based recognition.

Findings

01

15% error rate reduction with proposed methods

02

Effective on both real and simulated data

03

Works with small and large training sets

Abstract

Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition. Robustness of distant speech recognition in adverse acoustic conditions, on the other hand, remains a crucial open issue for future applications of human-machine interaction. To this end, several advances in speech enhancement, acoustic scene analysis as well as acoustic modeling, have recently contributed to improve the state-of-the-art in the field. One of the most effective approaches to derive a robust acoustic modeling is based on using contaminated speech, which proved helpful in reducing the acoustic mismatch between training and testing conditions. In this paper, we revise this classical approach in the context of modern DNN-HMM systems, and propose the adoption of three methods, namely, asymmetric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mravanelli/pySpeechRev
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.