Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition
Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy, Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali

TL;DR
This paper introduces a benchmark dataset and evaluates various automatic speech recognition metrics for code-switching speech, focusing on their correlation with human judgments to improve fair and robust evaluation.
Contribution
It develops the first benchmark dataset with human judgments for code-switching speech recognition and systematically evaluates metrics for correlation with human preferences.
Findings
Transliteration followed by text normalization yields the highest correlation with human judgments.
The benchmark dataset includes dialectal Arabic/English conversational speech.
Evaluation of diverse metrics reveals insights into effective assessment of code-switching ASR.
Abstract
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minimal editing of automatic hypotheses. We validate the guidelines using 4-way inter-annotator agreement. We evaluate a large number of metrics in terms of correlation with human judgments. The metrics we consider vary in terms of representation (orthographic, phonological, semantic), directness (intrinsic vs extrinsic), granularity (e.g. word, character), and similarity computation method. The highest correlation to human judgment is achieved using transliteration followed by text normalization. We release the first corpus for human acceptance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
