Deep Learning for Distant Speech Recognition

Mirco Ravanelli

arXiv:1712.06086·cs.CL·December 19, 2017

Deep Learning for Distant Speech Recognition

Mirco Ravanelli

PDF

TL;DR

This paper explores deep learning techniques to enhance distant speech recognition, addressing noise and reverberation challenges through novel architectures, data simulation methods, and cooperative neural network paradigms.

Contribution

It introduces new methodologies for data contamination, speech context exploitation, and a network of deep neural networks for robust distant speech recognition.

Findings

01

Improved acoustic models with enhanced robustness in noisy environments

02

Effective data simulation techniques for training DNNs in DSR

03

Demonstrated benefits of neural network cooperation in noisy conditions

Abstract

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.