Densely Connected Convolutional Networks for Speech Recognition
Chia Yu Li, Ngoc Thang Vu

TL;DR
This paper explores the application of densely connected convolutional networks (DenseNets) for acoustic modeling in speech recognition, demonstrating significant improvements over existing neural network models and effective data efficiency.
Contribution
It introduces DenseNets for speech recognition acoustic modeling and shows they outperform traditional models even with less training data.
Findings
DenseNets outperform DNNs, CNNs, VGGs in speech recognition tasks.
DenseNets achieve superior results with only half the training data.
DenseNets significantly outperform other models on Wall Street Journal dataset.
Abstract
This paper presents our latest investigation on Densely Connected Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic speech recognition. DenseN-ets are very deep, compact convolutional neural networks, which have demonstrated incredible improvements over the state-of-the-art results on several data sets in computer vision. Our experimental results show that DenseNet can be used for AM significantly outperforming other neural-based models such as DNNs, CNNs, VGGs. Furthermore, results on Wall Street Journal revealed that with only a half of the training data DenseNet was able to outperform other models trained with the full data set by a large margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsAttention Model · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · Average Pooling · Concatenated Skip Connection · Global Average Pooling · Dense Block · Kaiming Initialization · 1x1 Convolution
