OSN-MDAD: Machine Translation Dataset for Arabic Multi-Dialectal Conversations on Online Social Media
Fatimah Alzamzami, Abdulmotaleb El Saddik

TL;DR
This paper introduces a new multidialect Arabic dataset derived from social media content, designed to improve machine translation for informal Arabic dialects, validated by neural MT models showing superior performance.
Contribution
The paper presents a novel social media-based multidialect Arabic dataset and a universal translation guideline, addressing limitations of existing dialectal Arabic translation resources.
Findings
Neural MT models trained on the dataset outperform existing models.
The dataset effectively captures dialectal variations for social media content.
Proposed translation guidelines are broadly applicable.
Abstract
While resources for English language are fairly sufficient to understand content on social media, similar resources in Arabic are still immature. The main reason that the resources in Arabic are insufficient is that Arabic has many dialects in addition to the standard version (MSA). Arabs do not use MSA in their daily communications; rather, they use dialectal versions. Unfortunately, social users transfer this phenomenon into their use of social media platforms, which in turn has raised an urgent need for building suitable AI models for language-dependent applications. Existing machine translation (MT) systems designed for MSA fail to work well with Arabic dialects. In light of this, it is necessary to adapt to the informal nature of communication on social networks by developing MT systems that can effectively handle the various dialects of Arabic. Unlike for MSA that shows advanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
Methodsfail
