Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded   Chipsets

Lu Zeng; Sree Hari Krishnan Parthasarathi; Yuzong Liu; Alex Escott,; Santosh Kumar Cheekatmalla; Nikko Strom; Shiv Vitaladevuni

arXiv:2207.06920·cs.SD·September 9, 2022·1 cites

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets

Lu Zeng, Sree Hari Krishnan Parthasarathi, Yuzong Liu, Alex Escott,, Santosh Kumar Cheekatmalla, Nikko Strom, Shiv Vitaladevuni

PDF

Open Access

TL;DR

This paper introduces a two-stage sub 8-bit quantization aware training method for streaming keyword spotting models, achieving near-floating point accuracy while significantly reducing compute and memory requirements on embedded chipsets.

Contribution

The paper presents a novel 2-stage quantization algorithm for all model components, enabling efficient deployment of keyword spotting models on embedded hardware with minimal accuracy loss.

Findings

01

Achieves parity with full floating point model on detection error tradeoff curve.

02

Up to 3x CPU efficiency improvement and 4x memory reduction.

03

Effective quantization across various bit widths (1, 4, 5, 8-bit).

Abstract

We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1st-stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh(.) on dense layer weights. In the 2nd-stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identified production, far-field and near-field audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsAttentive Walk-Aggregating Graph Neural Network