Zero-Shot Context-Aware ASR for Diverse Arabic Varieties

Bashar Talafha; Amin Abu Alhassan; Muhammad Abdul-Mageed

arXiv:2511.18774·cs.CL·January 13, 2026

Zero-Shot Context-Aware ASR for Diverse Arabic Varieties

Bashar Talafha, Amin Abu Alhassan, Muhammad Abdul-Mageed

PDF

Open Access

TL;DR

This paper introduces a context-aware decoding approach for zero-shot Arabic speech recognition, improving accuracy across diverse dialects and accents by leveraging external information without retraining models.

Contribution

It proposes novel prompt-based and proxy-guided methods for context-aware inference applicable to various ASR architectures, enhancing zero-shot dialectal Arabic recognition.

Findings

01

Average WER reductions of over 20% on MSA and accented Arabic.

02

Proxy-guided selection improves WER by 15.6% on MSA.

03

Context-aware decoding generalizes beyond encoder-decoder models.

Abstract

Zero-shot ASR for Arabic remains challenging: while multilingual models perform well on Modern Standard Arabic (MSA), error rates rise sharply on dialectal and accented speech due to linguistic mismatch and scarce labeled data. We study context-aware decoding as a lightweight test-time adaptation paradigm that conditions inference on external side information without parameter updates. For promptable encoder-decoder ASR (e.g., Whisper), we incorporate context through (i) decoder prompting with first-pass hypotheses and (ii) encoder/decoder prefixing with retrieved speech-text exemplars, complemented by simple prompt reordering and optional speaker-matched synthetic exemplars to improve robustness in informal and multi-speaker settings. To extend contextual adaptation beyond promptable architectures, we introduce proxy-guided n-best selection for CTC ASR: given one or more external proxy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing