Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Haoning Xu, Zhaoqing Li, Zengrui Jin, Huimeng Wang, Youjun Chen,, Guinan Li, Mengzhe Geng, Shujie Hu, Jiajun Deng, Xunying Liu

TL;DR
This paper introduces a unified mixed-precision quantization method for speech foundation models that improves compression efficiency and reduces system time without increasing word error rate, demonstrated on LibriSpeech datasets.
Contribution
It proposes a novel integrated approach for mixed-precision learning and quantization, outperforming traditional two-stage methods in compression ratio and efficiency.
Findings
Increased lossless compression ratios up to 1.9x.
Reduced system compression time by up to 1.9 times.
Achieved 8.6x compression with 3.5-bit quantization.
Abstract
This paper presents a novel mixed-precision quantization approach for speech foundation models that tightly integrates mixed-precision learning and quantized model parameter estimation into one single model compression stage. Experiments conducted on LibriSpeech dataset with fine-tuned wav2vec2.0-base and HuBERT-large models suggest the resulting mixed-precision quantized models increased the lossless compression ratio by factors up to 1.7x and 1.9x over the respective uniform-precision and two-stage mixed-precision quantized baselines that perform precision learning and model parameters quantization in separate and disjointed stages, while incurring no statistically word error rate (WER) increase over the 32-bit full-precision models. The system compression time of wav2vec2.0-base and HuBERT-large models is reduced by up to 1.9 and 1.5 times over the two-stage mixed-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
