Script collapse in multilingual ASR: A reference-free metric and 100-pair benchmark
Hanif Rahman

TL;DR
This paper introduces Script Fidelity Rate (SFR), a reference-free metric to detect script collapse in multilingual ASR models, revealing systematic failures across ten languages and proposing prompting strategies to improve script accuracy.
Contribution
The paper proposes SFR as a new reference-free metric for script fidelity, benchmarks script collapse across multiple models and languages, and demonstrates prompt-based fixes to improve script accuracy.
Findings
21% of model-language pairs exhibit script collapse.
Prompting improves SFR from 71.2% to 97.7% on average.
Identified four common script collapse patterns.
Abstract
Word error rate (WER) is the dominant metric for automatic speech recognition, yet it cannot detect a systematic failure mode: models that produce fluent output in the wrong writing system. We define Script Fidelity Rate (SFR), the fraction of hypothesis characters in the target script block, computable without reference transcriptions, and report a systematic measurement of script collapse across ten languages spanning six writing systems and ten models (seven Whisper sizes, MMS-1B, SeamlessM4T-v2, and Gemma 4 E2B) on FLEURS test sets. Across 100 evaluated model-language pairs, 21 (21%; 95% Wilson CI: 14-30%) exhibit script collapse (SFR less than 10%): 20 involve Whisper and one involves Gemma 4 E2B on Urdu under a generic transcription prompt. In a ten-language Gemma 4 probe, script-aware prompting raises mean SFR from 71.2% to 97.7%, fixes Urdu collapse (6.5% to 97.0%), and recovers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
