Fine-tuning Whisper for Pashto ASR: strategies and scale

Hanif Rahman

arXiv:2604.06507·cs.CL·April 9, 2026

Fine-tuning Whisper for Pashto ASR: strategies and scale

Hanif Rahman

PDF

1 Models

TL;DR

This paper explores fine-tuning strategies for adapting Whisper speech recognition models to Pashto, a language previously unsupported, demonstrating effective methods and analyzing their performance and limitations.

Contribution

It systematically compares multiple fine-tuning approaches for Whisper on Pashto, revealing the most effective strategies and providing insights into their advantages and shortcomings.

Findings

01

Vanilla fine-tuning outperforms LoRA, frozen-encoder, and Urdu transfer methods.

02

Whisper-small achieves a WER of 24.89% on Pashto with 113 hours of data.

03

Online augmentation improves WER by 7.25 percentage points.

Abstract

Pashto is absent from Whisper's pre-training corpus despite being one of CommonVoice's largest language collections, leaving off-the-shelf models unusable: all Whisper sizes output Arabic, Dari, or Urdu script on Pashto audio, achieving word error rates above 100%. We compare four fine-tuning strategies for whisper-base on CommonVoice Pashto v20: vanilla full fine-tuning, LoRA (rank 64), frozen-encoder (2/6 layers), and multistage Urdu-to-Pashto transfer. We extend vanilla fine-tuning to whisper-small and whisper-large-v3-turbo on CommonVoice Pashto v24 (113 hours). Vanilla fine-tuning achieves WER 21.22% on CV20, outperforming LoRA by 33.36 pp, frozen-encoder by 14.76 pp, and Urdu transfer by 44.56 pp. Frozen-encoder fine-tuning degrades performance on whisper-base (6 encoder layers): layer-function separation does not hold at this depth, and freezing removes a third of trainable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
nassimaODL/whisper-small-arabic-cv18-lora
model· 70 dl· ♡ 1
70 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.