Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition
Salar Jafarlou, Soheil Khorram, Vinay Kothapally, John H.L. Hansen

TL;DR
This paper investigates large receptive field CNN architectures like dilated, recursive, and stacked hourglass networks to improve distant speech recognition, showing significant WER reductions over standard CNNs in noisy reverberant environments.
Contribution
It introduces and compares various large receptive field CNN variants for DSR, demonstrating their effectiveness over standard CNNs with fixed parameter counts.
Findings
Stacked hourglass networks reduce WER by 8.9% relative.
Large receptive field CNNs outperform standard CNNs in distant speech tasks.
Experiments confirm improvements using realistic room impulse responses.
Abstract
Despite significant efforts over the last few years to build a robust automatic speech recognition (ASR) system for different acoustic settings, the performance of the current state-of-the-art technologies significantly degrades in noisy reverberant environments. Convolutional Neural Networks (CNNs) have been successfully used to achieve substantial improvements in many speech processing applications including distant speech recognition (DSR). However, standard CNN architectures were not efficient in capturing long-term speech dynamics, which are essential in the design of a robust DSR system. In the present study, we address this issue by investigating variants of large receptive field CNNs (LRF-CNNs) which include deeply recursive networks, dilated convolutional neural networks, and stacked hourglass networks. To compare the efficacy of the aforementioned architectures with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Residual Connection · Convolution · Hourglass Module · Stacked Hourglass Network
