Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models

Xinyu Wang; Ziyu Zhao; Yajie Luo; Yihong Wu; Liheng Ma; Jingrui Tian; Lei Ding; Xiao-Wen Chang; Peng Lu

arXiv:2601.02455·cs.SD·April 28, 2026

Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models

Xinyu Wang, Ziyu Zhao, Yajie Luo, Yihong Wu, Liheng Ma, Jingrui Tian, Lei Ding, Xiao-Wen Chang, Peng Lu

PDF

TL;DR

FADE is a diagnostic-driven framework that adaptively compensates for layer-specific quantization errors in encoder-decoder ASR models, improving accuracy without retraining.

Contribution

It introduces a novel layer-wise adaptive compensation method combining vulnerability and reliability scores, tailored for encoder-decoder ASR models.

Findings

01

FADE improves Word Error Rate across multiple models and benchmarks.

02

It reduces run-to-run variance in quantized ASR models.

03

Effective at 3- and 4-bit quantization without retraining.

Abstract

Deploying Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires aggressive low-bit weight quantization. Layer-wise post-training quantization is practical and effective, but it suffers from cross-layer error accumulation. Existing compensation methods typically use a single global strength for all layers, which is ill-suited to encoder-decoder ASR models whose acoustic encoder and linguistic decoder exhibit markedly different sensitivities to quantization noise. We propose FADE, a diagnostic-driven framework that assigns each layer an adaptive compensation coefficient by combining two complementary signals: an intrinsic vulnerability score from weight geometry and a calibration reliability score from the data-driven solution. The resulting layer-wise coefficient balances local quantization fidelity against cross-layer error correction, enabling tailored…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.