4-bit Conformer with Native Quantization Aware Training for Speech Recognition
Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani, Agrawal, Oleg Rybakov

TL;DR
This paper introduces a 4-bit quantization aware training method for Conformer-based speech recognition models, achieving significant size reduction without performance loss on large-scale datasets.
Contribution
It presents the first practical implementation of 4-bit quantization for large-scale ASR systems using native quantization aware training, enabling lossless compression.
Findings
Achieved 5.8x model size reduction with 4-bit quantization on LibriSpeech.
Demonstrated viability of 4-bit quantization in large-scale practical ASR systems.
Produced a 5x size reduction with mixed 4-bit and 8-bit weights without performance loss.
Abstract
Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsAttentive Walk-Aggregating Graph Neural Network
