Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Kumud Tripathi; Aditya Srinivas Menon; Aman Gaurav; Raj Prakash Gohil; Pankaj Wasnik

arXiv:2511.14219·cs.AI·November 19, 2025

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Kumud Tripathi, Aditya Srinivas Menon, Aman Gaurav, Raj Prakash Gohil, Pankaj Wasnik

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel two-stage approach combining Adaptive Layer Attention and knowledge distillation to reduce hallucinations in Whisper speech recognition, especially under noisy conditions, improving robustness and accuracy.

Contribution

It proposes a new architecture that enhances Whisper's robustness through adaptive layer attention and knowledge distillation, directly addressing hallucination errors in noisy environments.

Findings

01

Significant reduction in hallucinations and word error rates in noisy conditions

02

Improved robustness of Whisper model without sacrificing clean speech performance

03

Effective use of multi-objective knowledge distillation for noise robustness

Abstract

The Whisper model, an open-source automatic speech recognition system, is widely adopted for its strong performance across multilingual and zero-shot settings. However, it frequently suffers from hallucination errors, especially under noisy acoustic conditions. Previous works to reduce hallucinations in Whisper-style ASR systems have primarily focused on audio preprocessing or post-processing of transcriptions to filter out erroneous content. However, modifications to the Whisper model itself remain largely unexplored to mitigate hallucinations directly. To address this challenge, we present a two-stage architecture that first enhances encoder robustness through Adaptive Layer Attention (ALA) and further suppresses hallucinations using a multi-objective knowledge distillation (KD) framework. In the first stage, ALA groups encoder layers into semantically coherent blocks via inter-layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Listen like a Teacher: Mitigating Whisper Hallucinations Using Adaptive Layer Attention and Knowledge Distillation· underline

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Hearing Loss and Rehabilitation