Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet
Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang,, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

TL;DR
This paper introduces Omni-sparsity DNN, a single neural network that can be efficiently pruned to produce optimized models across a range of sizes for on-device streaming ASR, saving training resources while maintaining high accuracy.
Contribution
It proposes a novel training strategy for a unified sparse DNN that efficiently generates multiple model sizes along the accuracy-resource Pareto front.
Findings
Achieves 2%-6.6% better WER on LibriSpeech Test-other.
Reduces training time and resources compared to training multiple models.
Maintains or improves accuracy across different model sizes.
Abstract
From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are trapped in a dilemma of optimizing model accuracy by training and fine-tuning models for each individual edge device while keeping the training GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. We develop training strategies for Omni-sparsity DNN that allows it to find models along the Pareto front of word-error-rate (WER) vs model size while keeping the training GPU-hours to no more than that of training one singular model. We demonstrate the Omni-sparsity DNN with streaming E2E ASR models. Our results show great saving on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
