Empirical Evaluation of Parallel Training Algorithms on Acoustic   Modeling

Wenpeng Li; BinBin Zhang; Lei Xie; Dong Yu

arXiv:1703.05880·cs.CL·December 6, 2018·1 cites

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling

Wenpeng Li, BinBin Zhang, Lei Xie, Dong Yu

PDF

Open Access

TL;DR

This paper systematically compares four parallel training algorithms for deep learning-based speech recognition models, providing practical guidance on their efficiency, stability, and scalability for large datasets and models.

Contribution

It offers a comprehensive empirical evaluation of ASGD, BMUF, BSP, and EASGD on speech recognition tasks, highlighting BMUF as the most effective method.

Findings

01

BMUF is the most stable and scalable algorithm.

02

BMUF often outperforms single-GPU SGD.

03

ASGD can be a viable alternative in some scenarios.

Abstract

Deep learning models (DLMs) are state-of-the-art techniques in speech recognition. However, training good DLMs can be time consuming especially for production-size models and corpora. Although several parallel training algorithms have been proposed to improve training efficiency, there is no clear guidance on which one to choose for the task in hand due to lack of systematic and fair comparison among them. In this paper we aim at filling this gap by comparing four popular parallel training algorithms in speech recognition, namely asynchronous stochastic gradient descent (ASGD), blockwise model-update filtering (BMUF), bulk synchronous parallel (BSP) and elastic averaging stochastic gradient descent (EASGD), on 1000-hour LibriSpeech corpora using feed-forward deep neural networks (DNNs) and convolutional, long short-term memory, DNNs (CLDNNs). Based on our experiments, we recommend using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsStochastic Gradient Descent