Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Chen Feng; Yicheng Lin; Shaojie Zhuo; Chenzheng Su; Ramchalam Kinattinkara Ramakrishnan; Zhaocong Yuan; Xiaopeng Zhang

arXiv:2507.07877·cs.SD·August 5, 2025

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Xiaopeng Zhang

PDF

Open Access

TL;DR

This paper benchmarks advanced post-training quantization methods on edge ASR models, revealing that 3-bit quantization can maintain high accuracy, thus enabling efficient deployment on resource-limited devices.

Contribution

It provides a comprehensive evaluation of state-of-the-art PTQ methods on leading edge-ASR models, offering insights into quantization trade-offs for edge deployment.

Findings

01

3-bit quantization can preserve accuracy with advanced PTQ techniques

02

Quantization impacts model accuracy, memory, and computational efficiency

03

Benchmark results guide optimal quantization configurations for edge ASR

Abstract

Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on resource-constrained edge devices (e.g., IoT device, wearables) still presents substantial challenges due to strict limits on memory, compute and power. Quantization, particularly Post-Training Quantization (PTQ), offers an effective way to reduce model size and inference cost without retraining. Despite its importance, the performance implications of various advanced quantization methods and bit-width configurations on ASR models remain unclear. In this work, we present a comprehensive benchmark of eight state-of-the-art (SOTA) PTQ methods applied to two leading edge-ASR model families, Whisper and Moonshine. We systematically evaluate model performances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing