Dynamic Data Pruning for Automatic Speech Recognition

Qiao Xiao; Pingchuan Ma; Adriana Fernandez-Lopez; Boqian Wu; Lu Yin,; Stavros Petridis; Mykola Pechenizkiy; Maja Pantic; Decebal Constantin Mocanu; and Shiwei Liu

arXiv:2406.18373·cs.CL·June 27, 2024

Dynamic Data Pruning for Automatic Speech Recognition

Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin,, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, and Shiwei Liu

PDF

Open Access

TL;DR

This paper introduces a novel dynamic data pruning method for automatic speech recognition that reduces training data size and computational costs while maintaining performance.

Contribution

It is the first to explore dynamic data pruning in ASR, proposing DDP-ASR with fine-grained pruning tailored for speech datasets, achieving significant efficiency gains.

Findings

01

Achieves full-data performance with only 70% data usage

02

Saves up to 1.6x training time with negligible accuracy loss

03

Introduces fine-grained pruning granularities for speech data

Abstract

The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results. To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data. Furthermore, we introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets, going beyond the conventional pruning of entire time sequences. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsPruning