Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Badr M. Abdullah, Matthew Baas, Bernd M\"obius, Dietrich Klakow

TL;DR
This paper introduces a voice conversion method to enhance the robustness of Arabic dialect identification systems, significantly improving cross-domain accuracy and reducing speaker bias, thereby supporting inclusive speech technology development.
Contribution
The paper presents a novel voice conversion approach that boosts cross-domain robustness in Arabic dialect identification, outperforming existing methods and addressing speaker bias issues.
Findings
Up to +34.1% accuracy improvement across domains
Voice conversion reduces speaker bias in ADI datasets
Achieves state-of-the-art performance in cross-domain scenarios
Abstract
Arabic dialect identification (ADI) systems are essential for large-scale data collection pipelines that enable the development of inclusive speech technologies for Arabic language varieties. However, the reliability of current ADI systems is limited by poor generalization to out-of-domain speech. In this paper, we present an effective approach based on voice conversion for training ADI models that achieves state-of-the-art performance and significantly improves robustness in cross-domain scenarios. Evaluated on a newly collected real-world test set spanning four different domains, our approach yields consistent improvements of up to +34.1% in accuracy across domains. Furthermore, we present an analysis of our approach and demonstrate that voice conversion helps mitigate the speaker bias in the ADI dataset. We release our robust ADI model and cross-domain evaluation dataset to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
