Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Hanif Rahman

arXiv:2604.04598·cs.CL·April 7, 2026

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Hanif Rahman

PDF

TL;DR

This paper evaluates multilingual speech models on Pashto, highlighting zero-shot performance, script fidelity issues, and cross-domain robustness, providing the first public benchmarks for Pashto ASR.

Contribution

It offers the first reproducible evaluation of multilingual models on Pashto, revealing zero-shot performance, script failure rates, and cross-domain transfer insights.

Findings

01

SeamlessM4T achieves 39.7% WER on Common Voice Pashto test set.

02

Zero-shot Whisper WER ranges from 90% to 297%, with model collapse at 461%.

03

Pashto script fidelity exceeds 93% in some models, but WER alone masks script failure.

Abstract

Pashto is spoken by approximately 60--80 million people but has no published benchmarks for multilingual automatic speech recognition (ASR) on any shared public test set. This paper reports the first reproducible multi-model evaluation on public Pashto data, covering zero-shot ASR, script-level failure, and cross-domain evaluation of fine-tuned models. For zero-shot ASR, ten models (all seven Whisper sizes, MMS-1B, SeamlessM4T-v2-large, and OmniASR-CTC-300M) are evaluated on the FLEURS Pashto test set and a filtered Common Voice~24 subset; zero-shot Whisper WER ranges from 90% to 297%, with the medium model collapsing to 461% on Common Voice~24 consistent with decoder looping. SeamlessM4T achieves 39.7% WER on Common Voice~24 (the best zero-shot result reported to date, as of submission); MMS-1B achieves 43.8% on FLEURS. For script failure, a language-identification audit shows that no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.