Partitioned Gradient Matching-based Data Subset Selection for   Compute-Efficient Robust ASR Training

Ashish Mittal; Durga Sivasubramanian; Rishabh Iyer; Preethi Jyothi and; Ganesh Ramakrishnan

arXiv:2210.16892·cs.LG·November 1, 2022

Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training

Ashish Mittal, Durga Sivasubramanian, Rishabh Iyer, Preethi Jyothi and, Ganesh Ramakrishnan

PDF

Open Access

TL;DR

This paper introduces Partitioned Gradient Matching, a scalable data subset selection method for training robust and efficient RNN-T speech recognition models, achieving significant speedups with minimal accuracy loss.

Contribution

The paper presents PGM, a novel distributable DSS algorithm tailored for large datasets and RNN-T models, enabling faster training with minimal performance degradation.

Findings

01

PGM achieves 3x to 6x speedup in training.

02

PGM maintains under 1% WER accuracy loss.

03

PGM performs well even with noisy training data.

Abstract

Training state-of-the-art ASR systems such as RNN-T often has a high associated financial and environmental cost. Training with a subset of training data could mitigate this problem if the subset selected could achieve on-par performance with training with the entire dataset. Although there are many data subset selection(DSS) algorithms, direct application to the RNN-T is difficult, especially the DSS algorithms that are adaptive and use learning dynamics such as gradients, as RNN-T tend to have gradients with a significantly larger memory footprint. In this paper, we propose Partitioned Gradient Matching (PGM) a novel distributable DSS algorithm, suitable for massive datasets like those used to train RNN-T. Through extensive experiments on Librispeech 100H and Librispeech 960H, we show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation (under 1%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms

MethodsProbability Guided Maxout