MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre   Broadcast Challenge

Suwon Shon; Ahmed Ali; James Glass

arXiv:1709.00387·cs.CL·September 4, 2017

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

Suwon Shon, Ahmed Ali, James Glass

PDF

TL;DR

This paper presents a robust Arabic Dialect Identification system for the 2017 MGB-3 challenge, combining neural networks and i-vector techniques to distinguish dialects with high accuracy despite domain variability.

Contribution

It introduces a novel ADI system utilizing Siamese neural networks and i-vector post-processing to handle dialect variability and domain mismatches.

Findings

01

Achieved 75% accuracy on the test set

02

Effectively distinguished four dialects and Modern Standard Arabic

03

Demonstrated robustness against domain mismatches

Abstract

In order to successfully annotate the Arabic speech con- tent found in open-domain media broadcasts, it is essential to be able to process a diverse set of Arabic dialects. For the 2017 Multi-Genre Broadcast challenge (MGB-3) there were two possible tasks: Arabic speech recognition, and Arabic Dialect Identification (ADI). In this paper, we describe our efforts to create an ADI system for the MGB-3 challenge, with the goal of distinguishing amongst four major Arabic dialects, as well as Modern Standard Arabic. Our research fo- cused on dialect variability and domain mismatches between the training and test domain. In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches. Both Acoustic and linguistic features were used for the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.