BEST-RQ-Based Self-Supervised Learning for Whisper Domain Adaptation

Rapha\"el Bagat; Irina Illina; Emmanuel Vincent

arXiv:2510.24570·cs.CL·May 13, 2026

BEST-RQ-Based Self-Supervised Learning for Whisper Domain Adaptation

Rapha\"el Bagat, Irina Illina, Emmanuel Vincent

PDF

TL;DR

This paper introduces BEARD, a self-supervised learning framework that adapts Whisper's encoder for low-resource, noisy, and specialized speech domains using unlabeled data, improving ASR performance.

Contribution

The paper presents the first use of a self-supervised learning objective for domain adaptation of Whisper, combining BEST-RQ with knowledge distillation for improved speech recognition.

Findings

01

BEARD achieves a 12% relative improvement over baseline models.

02

The approach effectively adapts Whisper to challenging ATC communication data.

03

Using 5,000 hours of unlabeled speech enhances ASR accuracy in low-resource domains.

Abstract

Automatic Speech Recognition (ASR) systems, despite large multilingual training, struggle in low-resource scenarios where labeled data is scarce. We propose BEARD (BEST-RQ Encoder Adaptation with Re-training and Distillation), a novel framework designed to adapt Whisper's encoder with unlabeled data. Unlike traditional self-supervised learning methods, BEARD uniquely combines a BEST-RQ objective with knowledge distillation from a frozen teacher encoder, ensuring the encoder's complementarity with the pre-trained decoder. Our experiments focus on the ATCO2 corpus from the challenging Air Traffic Control (ATC) communications domain, characterized by non-native speech, noise, and specialized phraseology. Using about 5,000 hours of untranscribed speech for BEARD and 2 hours of transcribed speech for fine-tuning, the proposed approach significantly outperforms previous baseline and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.