Distributed Training of Deep Neural Network Acoustic Models for   Automatic Speech Recognition

Xiaodong Cui; Wei Zhang; Ulrich Finkler; George Saon; Michael Picheny,; David Kung

arXiv:2002.10502·cs.DC·February 26, 2020·1 cites

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

Xiaodong Cui, Wei Zhang, Ulrich Finkler, George Saon, Michael Picheny,, David Kung

PDF

Open Access

TL;DR

This paper reviews distributed training techniques for deep neural network acoustic models in automatic speech recognition, focusing on balancing communication and computation to improve training efficiency and performance.

Contribution

It provides a comprehensive overview of distributed training strategies for ASR acoustic models and evaluates their effectiveness in high-performance computing environments.

Findings

01

Distributed training strategies can significantly improve training speed.

02

Balancing communication and computation is crucial for efficiency.

03

Experimental results demonstrate convergence and recognition performance improvements.

Abstract

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning. The improvements in performance can be attributed to both improved models and large-scale training data. Key to training such models is the employment of efficient distributed learning techniques. In this article, we provide an overview of distributed training techniques for deep neural network acoustic models for ASR. Starting with the fundamentals of data parallel stochastic gradient descent (SGD) and ASR acoustic modeling, we will investigate various distributed training strategies and their realizations in high performance computing (HPC) environments with an emphasis on striking the balance between communication and computation. Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing