WER We Stand: Benchmarking Urdu ASR Models

Samee Arif; Sualeha Farid; Aamina Jamal Khan; Mustafa Abbas; Agha Ali Raza; Awais Athar

arXiv:2409.11252·cs.CL·June 9, 2025·2 cites

WER We Stand: Benchmarking Urdu ASR Models

Samee Arif, Sualeha Farid, Aamina Jamal Khan, Mustafa Abbas, Agha Ali Raza, Awais Athar

PDF

Open Access

TL;DR

This paper benchmarks Urdu ASR models using WER across read and conversational speech datasets, introduces a new conversational dataset, and highlights challenges in low-resource language ASR evaluation.

Contribution

It provides the first conversational speech dataset for Urdu ASR benchmarking and compares multiple models, revealing performance variations and evaluation challenges.

Findings

01

Seamless-large outperforms others on read speech

02

Whisper-large performs best on conversational speech

03

Evaluation of low-resource ASR models requires robust text normalization

Abstract

This paper presents a comprehensive evaluation of Urdu Automatic Speech Recognition (ASR) models. We analyze the performance of three ASR model families: Whisper, MMS, and Seamless-M4T using Word Error Rate (WER), along with a detailed examination of the most frequent wrong words and error types including insertions, deletions, and substitutions. Our analysis is conducted using two types of datasets, read speech and conversational speech. Notably, we present the first conversational speech dataset designed for benchmarking Urdu ASR models. We find that seamless-large outperforms other ASR models on the read speech dataset, while whisper-large performs best on the conversational speech dataset. Furthermore, this evaluation highlights the complexities of assessing ASR models for low-resource languages like Urdu using quantitative metrics alone and emphasizes the need for a robust Urdu…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Network Packet Processing and Optimization