Neural Language Model Pruning for Automatic Speech Recognition
Leonardo Emili, Thiago Fraga-Silva, Ernest Pusateri, Markus, Nu{\ss}baum-Thom, Youssef Oualil

TL;DR
This paper investigates various pruning techniques for Transformer-based neural language models in speech recognition, analyzing their impact on accuracy and speed, and introduces a low-rank approximation method for incremental model compression.
Contribution
It provides an in-depth analysis of pruning criteria, methods, and schedulers for large-scale speech recognition models and proposes a novel low-rank approximation variant for flexible model compression.
Findings
Data-driven pruning outperforms magnitude-driven pruning.
Incremental pruning yields higher accuracy than one-shot pruning.
Low-rank approximation offers the best size-speed trade-off.
Abstract
We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsPruning
