Bifocal Neural ASR: Exploiting Keyword Spotting for Inference   Optimization

Jonathan Macoskey; Grant P. Strimel; Ariya Rastrow

arXiv:2108.01704·eess.AS·August 5, 2021

Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow

PDF

TL;DR

This paper introduces Bifocal RNN-T, a novel speech recognition architecture that leverages keyword spotting to optimize inference latency, achieving significant cost reductions while maintaining accuracy.

Contribution

The paper proposes Bifocal RNN-T with Bifocal LSTM, enabling dynamic computation pathways based on keyword spotting for improved inference efficiency.

Findings

01

Achieves 29.1% reduction in inference cost

02

Maintains comparable word error rates

03

Compatible with quantization and sparsification techniques

Abstract

We present Bifocal RNN-T, a new variant of the Recurrent Neural Network Transducer (RNN-T) architecture designed for improved inference time latency on speech recognition tasks. The architecture enables a dynamic pivot for its runtime compute pathway, namely taking advantage of keyword spotting to select which component of the network to execute for a given audio frame. To accomplish this, we leverage a recurrent cell we call the Bifocal LSTM (BFLSTM), which we detail in the paper. The architecture is compatible with other optimization strategies such as quantization, sparsification, and applying time-reduction layers, making it especially applicable for deployed, real-time speech recognition settings. We present the architecture and report comparative experimental results on voice-assistant speech recognition tasks. Specifically, we show our proposed Bifocal RNN-T can improve inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory