Speech Recognition Challenge in the Wild: Arabic MGB-3
Ahmed Ali, Stephan Vogel, Steve Renals

TL;DR
The Arabic MGB-3 Challenge advances speech recognition in dialectal Arabic across diverse genres, introducing dialect identification, and reports on system performances from thirteen participating teams.
Contribution
This paper introduces the Arabic MGB-3 Challenge focusing on dialectal Arabic speech recognition and dialect identification across multiple genres, with detailed evaluation results.
Findings
Thirteen teams participated with ten systems submitted.
Significant progress in dialectal Arabic speech recognition achieved.
Effective dialect identification methods demonstrated.
Abstract
This paper describes the Arabic MGB-3 Challenge - Arabic Speech Recognition in the Wild. Unlike last year's Arabic MGB-2 Challenge, for which the recognition task was based on more than 1,200 hours broadcast TV news recordings from Aljazeera Arabic TV programs, MGB-3 emphasises dialectal Arabic using a multi-genre collection of Egyptian YouTube videos. Seven genres were used for the data collection: comedy, cooking, family/kids, fashion, drama, sports, and science (TEDx). A total of 16 hours of videos, split evenly across the different genres, were divided into adaptation, development and evaluation data sets. The Arabic MGB-Challenge comprised two tasks: A) Speech transcription, evaluated on the MGB-3 test set, along with the 10 hour MGB-2 test set to report progress on the MGB-2 evaluation; B) Arabic dialect identification, introduced this year in order to distinguish between four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
