4-bit Conformer with Native Quantization Aware Training for Speech   Recognition

Shaojin Ding; Phoenix Meadowlark; Yanzhang He; Lukasz Lew; Shivani; Agrawal; Oleg Rybakov

arXiv:2203.15952·eess.AS·March 6, 2023

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani, Agrawal, Oleg Rybakov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a 4-bit quantization aware training method for Conformer-based speech recognition models, achieving significant size reduction without performance loss on large-scale datasets.

Contribution

It presents the first practical implementation of 4-bit quantization for large-scale ASR systems using native quantization aware training, enabling lossless compression.

Findings

01

Achieved 5.8x model size reduction with 4-bit quantization on LibriSpeech.

02

Demonstrated viability of 4-bit quantization in large-scale practical ASR systems.

03

Produced a 5x size reduction with mixed 4-bit and 8-bit weights without performance loss.

Abstract

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/aqt
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsAttentive Walk-Aggregating Graph Neural Network