Sparsification via Compressed Sensing for Automatic Speech Recognition
Kai Zhen (1, 2), Hieu Duy Nguyen (2), Feng-Ju Chang (2), Athanasios, Mouchtaris (2), and Ariya Rastrow (2). ((1) Indiana University Bloomington,, (2) Alexa Machine Learning, Amazon, USA)

TL;DR
This paper introduces a compressed sensing based pruning method for automatic speech recognition models, effectively reducing model size and latency while maintaining accuracy, outperforming existing pruning techniques.
Contribution
It proposes a novel CSP approach that integrates compressed sensing into model training for improved sparse pruning in ASR models.
Findings
CSP outperforms existing pruning methods in ASR tasks.
The approach reduces model size and latency effectively.
It maintains high accuracy despite aggressive sparsification.
Abstract
In order to achieve high accuracy for machine learning (ML) applications, it is essential to employ models with a large number of parameters. Certain applications, such as Automatic Speech Recognition (ASR), however, require real-time interactions with users, hence compelling the model to have as low latency as possible. Deploying large scale ML applications thus necessitates model quantization and compression, especially when running ML models on resource constrained devices. For example, by forcing some of the model weight values into zero, it is possible to apply zero-weight compression, which reduces both the model size and model reading time from the memory. In the literature, such methods are referred to as sparse pruning. The fundamental questions are when and which weights should be forced to zero, i.e. be pruned. In this work, we propose a compressed sensing based pruning (CSP)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
