Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition
Xiaodong Cui, Wei Zhang, Ulrich Finkler, George Saon, Michael Picheny,, David Kung

TL;DR
This paper reviews distributed training techniques for deep neural network acoustic models in automatic speech recognition, focusing on balancing communication and computation to improve training efficiency and performance.
Contribution
It provides a comprehensive overview of distributed training strategies for ASR acoustic models and evaluates their effectiveness in high-performance computing environments.
Findings
Distributed training strategies can significantly improve training speed.
Balancing communication and computation is crucial for efficiency.
Experimental results demonstrate convergence and recognition performance improvements.
Abstract
The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neural network acoustic models for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we will investigate various distributed training strategies and their realizations in high performance computing (HPC) environments with an emphasis on striking the balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
