Optimizing Speech Recognition For The Edge

Yuan Shangguan; Jian Li; Qiao Liang; Raziel Alvarez; Ian McGraw

arXiv:1909.12408·cs.CL·February 10, 2020·58 cites

Optimizing Speech Recognition For The Edge

Yuan Shangguan, Jian Li, Qiao Liang, Raziel Alvarez, Ian McGraw

PDF

Open Access

TL;DR

This paper explores methods to optimize speech recognition models for edge devices by transitioning from traditional architectures to efficient end-to-end neural models, employing pruning and quantization to achieve high accuracy in small, fast models.

Contribution

It introduces a process for transforming a baseline LSTM-based RNN-Transducer into a compact, efficient on-device speech recognizer using layer optimization, pruning, and quantization techniques.

Findings

01

Achieved an order of magnitude reduction in model size.

02

Maintained high recognition accuracy despite model compression.

03

Demonstrated feasibility of deploying high-quality speech recognition on mobile devices.

Abstract

While most deployed speech recognition systems today still run on servers, we are in the midst of a transition towards deployments on edge devices. This leap to the edge is powered by the progression from traditional speech recognition pipelines to end-to-end (E2E) neural architectures, and the parallel development of more efficient neural network topologies and optimization techniques. Thus, we are now able to create highly accurate speech recognizers that are both small and fast enough to execute on typical mobile devices. In this paper, we begin with a baseline RNN-Transducer architecture comprised of Long Short-Term Memory (LSTM) layers. We then experiment with a variety of more computationally efficient layer types, as well as apply optimization techniques like neural connection pruning and parameter quantization to construct a small, high quality, on-device speech recognizer that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications

MethodsPruning