Deep Learning for Distant Speech Recognition
Mirco Ravanelli

TL;DR
This paper explores deep learning techniques to enhance distant speech recognition, addressing noise and reverberation challenges through novel architectures, data simulation methods, and cooperative neural network paradigms.
Contribution
It introduces new methodologies for data contamination, speech context exploitation, and a network of deep neural networks for robust distant speech recognition.
Findings
Improved acoustic models with enhanced robustness in noisy environments
Effective data simulation techniques for training DNNs in DSR
Demonstrated benefits of neural network cooperation in noisy conditions
Abstract
Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
