Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs

Yuan Xie; Jiaqi Song; Guang Qiu; Xianliang Wang; Ming Lei; Jie Gao; and Jie Wu

arXiv:2604.08003·eess.AS·April 10, 2026

Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs

Yuan Xie, Jiaqi Song, Guang Qiu, Xianliang Wang, Ming Lei, Jie Gao, and Jie Wu

PDF

TL;DR

This paper analyzes entropy allocation in LLM-based ASR, proposing a multi-stage training strategy to improve efficiency and reduce hallucinations, achieving competitive results with fewer parameters.

Contribution

It introduces three entropy allocation metrics and a novel training approach that enhances parameter efficiency and hallucination robustness in LLM-based ASR.

Findings

01

Achieves state-of-the-art performance with only 2.3B parameters.

02

Effectively mitigates hallucinations through decoupling-oriented design.

03

Redesigns pretraining to address speech-text modality gap.

Abstract

Integrating large language models (LLMs) into automatic speech recognition (ASR) has become a dominant paradigm. Although recent LLM-based ASR models have shown promising performance on public benchmarks, it remains challenging to balance recognition quality with latency and overhead, while hallucinations further limit real-world deployment. In this study, we revisit LLM-based ASR from an entropy allocation perspective and introduce three metrics to characterize how training paradigms allocate entropy reduction between the speech encoder and the LLM. To remedy entropy-allocation inefficiencies in prevailing approaches, we propose a principled multi-stage training strategy grounded in capability-boundary awareness, optimizing parameter efficiency and hallucination robustness. Specifically, we redesign the pretraining strategy to alleviate the speech-text modality gap, and further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.