Quantizing Whisper-small: How design choices affect ASR performance

Arthur S\"ohler; Julian Irigoyen; Andreas S{\o}eborg Kirkedal

arXiv:2511.08093·eess.AS·May 22, 2026

Quantizing Whisper-small: How design choices affect ASR performance

Arthur S\"ohler, Julian Irigoyen, Andreas S{\o}eborg Kirkedal

PDF

TL;DR

This paper evaluates how different post-training quantization techniques affect the performance and size of Whisper-small speech recognition models, aiming to enable deployment on edge devices.

Contribution

It provides a comprehensive, cross-library analysis of PTQ methods on Whisper-small, identifying optimal configurations for size reduction and accuracy preservation.

Findings

01

Dynamic int8 quantization with Quanto reduces size by 57% and improves WER.

02

Static quantization underperforms due to Transformer architecture.

03

Aggressive formats like nf4 and int3 achieve up to 71% compression with accuracy trade-offs.

Abstract

Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantization (PTQ) on Whisper-small that disentangles the impact of quantization scheme, method, granularity, and bit-width. Our study is based on four libraries: PyTorch, Optimum-Quanto, HQQ, and bitsandbytes. Experiments on LibriSpeech test-clean and test-other show that dynamic int8 quantization with Quanto offers the best trade-off, reducing model size by 57% while improving on the baseline's word error rate. Static quantization performed worse, likely due to Whisper's Transformer architecture, while more aggressive formats (e.g., nf4, int3) achieved up to 71% compression at the cost of accuracy in noisy conditions. Overall, our results demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques