EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech   Recognition on the Edge

Zhong Qiu Lin; Audrey G. Chung; and Alexander Wong

arXiv:1810.08559·eess.AS·November 15, 2018·31 cites

EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge

Zhong Qiu Lin, Audrey G. Chung, and Alexander Wong

PDF

Open Access

TL;DR

This paper introduces EdgeSpeechNets, a family of highly efficient deep neural networks designed for speech recognition on edge devices, combining human and machine design strategies to achieve high accuracy with minimal resource usage.

Contribution

It presents a novel human-machine collaborative approach for designing low-footprint DNNs, resulting in EdgeSpeechNets that outperform existing models in accuracy and efficiency for on-device speech recognition.

Findings

01

EdgeSpeechNets achieve ~97% accuracy on Google Speech Commands dataset.

02

EdgeSpeechNets are up to 7.8x smaller and 36x more computationally efficient.

03

They enable real-time speech recognition on resource-constrained devices.

Abstract

Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting efficient network architectures. In this study, we explore a human-machine collaborative design strategy for building low-footprint DNN architectures for speech recognition through a marriage of human-driven principled network design prototyping and machine-driven design exploration. The efficacy of this design strategy is demonstrated through the design of a family of highly-efficient DNNs (nicknamed EdgeSpeechNets) for limited-vocabulary speech recognition. Experimental results using the Google…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing