What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations
Kavya Manohar, Leena G Pillai, Elizabeth Sherly

TL;DR
This paper critically examines the flaws in current text normalization practices used in evaluating multilingual ASR models, especially for Indic scripts, revealing how these practices can distort performance metrics and proposing more linguistically informed normalization methods.
Contribution
It identifies specific pitfalls in existing normalization routines for Indic scripts and advocates for linguistically informed normalization to improve evaluation accuracy.
Findings
Normalization routines can artificially inflate performance metrics.
Current practices often ignore linguistic nuances of Indic scripts.
Proposed normalization methods improve evaluation robustness.
Abstract
This paper explores the pitfalls in evaluating multilingual automatic speech recognition (ASR) models, with a particular focus on Indic language scripts. We investigate the text normalization routine employed by leading ASR models, including OpenAI Whisper, Meta's MMS, Seamless, and Assembly AI's Conformer, and their unintended consequences on performance metrics. Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, by removing inconsistencies such as variations in spelling, punctuation, and special characters, are fundamentally flawed when applied to Indic scripts. Through empirical analysis using text similarity scores and in-depth linguistic examination, we demonstrate that these flaws lead to artificially improved performance metrics for Indic languages. We conclude by proposing a shift towards developing text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Interpreting and Communication in Healthcare · Natural Language Processing Techniques
MethodsFocus
