Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training
Ashish Mittal, Durga Sivasubramanian, Rishabh Iyer, Preethi Jyothi and, Ganesh Ramakrishnan

TL;DR
This paper introduces Partitioned Gradient Matching, a scalable data subset selection method for training robust and efficient RNN-T speech recognition models, achieving significant speedups with minimal accuracy loss.
Contribution
The paper presents PGM, a novel distributable DSS algorithm tailored for large datasets and RNN-T models, enabling faster training with minimal performance degradation.
Findings
PGM achieves 3x to 6x speedup in training.
PGM maintains under 1% WER accuracy loss.
PGM performs well even with noisy training data.
Abstract
Training state-of-the-art ASR systems such as RNN-T often has a high associated financial and environmental cost. Training with a subset of training data could mitigate this problem if the subset selected could achieve on-par performance with training with the entire dataset. Although there are many data subset selection(DSS) algorithms, direct application to the RNN-T is difficult, especially the DSS algorithms that are adaptive and use learning dynamics such as gradients, as RNN-T tend to have gradients with a significantly larger memory footprint. In this paper, we propose Partitioned Gradient Matching (PGM) a novel distributable DSS algorithm, suitable for massive datasets like those used to train RNN-T. Through extensive experiments on Librispeech 100H and Librispeech 960H, we show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation (under 1%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
MethodsProbability Guided Maxout
