Optimizing Speech Recognition For The Edge
Yuan Shangguan, Jian Li, Qiao Liang, Raziel Alvarez, Ian McGraw

TL;DR
This paper explores methods to optimize speech recognition models for edge devices by transitioning from traditional architectures to efficient end-to-end neural models, employing pruning and quantization to achieve high accuracy in small, fast models.
Contribution
It introduces a process for transforming a baseline LSTM-based RNN-Transducer into a compact, efficient on-device speech recognizer using layer optimization, pruning, and quantization techniques.
Findings
Achieved an order of magnitude reduction in model size.
Maintained high recognition accuracy despite model compression.
Demonstrated feasibility of deploying high-quality speech recognition on mobile devices.
Abstract
While most deployed speech recognition systems today still run on servers, we are in the midst of a transition towards deployments on edge devices. This leap to the edge is powered by the progression from traditional speech recognition pipelines to end-to-end (E2E) neural architectures, and the parallel development of more efficient neural network topologies and optimization techniques. Thus, we are now able to create highly accurate speech recognizers that are both small and fast enough to execute on typical mobile devices. In this paper, we begin with a baseline RNN-Transducer architecture comprised of Long Short-Term Memory (LSTM) layers. We then experiment with a variety of more computationally efficient layer types, as well as apply optimization techniques like neural connection pruning and parameter quantization to construct a small, high quality, on-device speech recognizer that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications
MethodsPruning
