Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Sajjad Abdoli; Ghassan Al-Sumaidaee; Clayton W. Taylor; Ahmad ElShiekh; Ahmed Rashad

arXiv:2605.19069·cs.CL·May 22, 2026

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Sajjad Abdoli, Ghassan Al-Sumaidaee, Clayton W. Taylor, Ahmad ElShiekh, Ahmed Rashad

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a comprehensive benchmark for evaluating commercial ASR systems on code-switching speech in Arabic, Persian, and German, highlighting performance differences and proposing more reliable evaluation metrics.

Contribution

It provides a new benchmark dataset and evaluation pipeline for multilingual code-switching ASR, including a cost-effective scoring method and analysis of semantic similarity metrics.

Findings

01

ElevenLabs Scribe v2 achieved the lowest WER (13.2%) across all language pairs.

02

BERTScore proved more reliable than WER for Arabic and Persian code-switching evaluation.

03

Difficulty-stratified analysis revealed performance gaps hidden in aggregate metrics.

Abstract

Code-switching -- the natural alternation between two languages within a single utterance -- represents one of the most challenging and under-studied conditions for automatic speech recognition (ASR). Existing commercial ASR benchmarks predominantly evaluate clean, monolingual audio and report a single Word Error Rate (WER) figure that tells practitioners little about real-world multilingual performance. We present a benchmark evaluating five commercial ASR providers across four language pairs: Egyptian Arabic--English, Saudi Arabic (Najdi/Hijazi)--English, Persian (Farsi)--English, and German--English. Each dataset comprises 300 samples selected by a two-stage pipeline: a heuristic filter scoring transcripts on five structural code-switching signals, followed by a GPT-4o and Gemini 1.5 Pro ensemble scoring candidates across six linguistic dimensions. This pipeline reduces LLM scoring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/Perle-ai/ASR_Code_Switch
github

Datasets

Perle-ai/ASR_Code_Switch
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.