A New Benchmark for Evaluating Automatic Speech Recognition in the   Arabic Call Domain

Qusai Abo Obaidah; Muhy Eddin Za'ter; Adnan Jaljuli; Ali Mahboub; Asma; Hakouz; Bashar Al-Rfooh; Yazan Estaitia

arXiv:2403.04280·cs.AI·May 31, 2024·1 cites

A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain

Qusai Abo Obaidah, Muhy Eddin Za'ter, Adnan Jaljuli, Ali Mahboub, Asma, Hakouz, Bashar Al-Rfooh, Yazan Estaitia

PDF

Open Access

TL;DR

This paper introduces a comprehensive benchmark for Arabic speech recognition in telephone call scenarios, addressing dialectal diversity, audio quality issues, and conversational styles to improve ASR system evaluation.

Contribution

It presents a new benchmark dataset tailored for Arabic call speech recognition, capturing dialectal and acoustic variability, and provides baseline performance evaluations.

Findings

01

Benchmark covers diverse dialects and call conditions.

02

Baseline ASR performance established on the new dataset.

03

Highlights challenges of Arabic ASR in telephonic environments.

Abstract

This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language. Arabic, characterized by its rich dialectal diversity and phonetic complexity, presents a number of unique challenges for automatic speech recognition (ASR) systems. These challenges are further amplified in the domain of telephone calls, where audio quality, background noise, and conversational speech styles negatively affect recognition accuracy. Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications. By incorporating diverse dialectical expressions and accounting for the variable quality of call recordings, this benchmark seeks to provide a rigorous testing ground for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing