Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015
Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed, Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash

TL;DR
This paper presents a robust Egyptian Arabic-to-English SMT system for informal dialects, utilizing advanced preprocessing and modeling techniques, achieving second place in all genres at NIST OpenMT'2015.
Contribution
The authors developed a comprehensive SMT system incorporating novel preprocessing, neural models, and interpolation methods tailored for dialectal Arabic translation.
Findings
System ranked second across all genres
Effective handling of informal dialectal Arabic
Integration of neural and traditional SMT features
Abstract
The paper describes the Egyptian Arabic-to-English statistical machine translation (SMT) system that the QCRI-Columbia-NYUAD (QCN) group submitted to the NIST OpenMT'2015 competition. The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech. Thus, our efforts focused on processing and standardizing Arabic, e.g., using tools such as 3arrib and MADAMIRA. We further trained a phrase-based SMT system using state-of-the-art features and components such as operation sequence model, class-based language model, sparse features, neural network joint model, genre-based hierarchically-interpolated language model, unsupervised transliteration mining, phrase-table merging, and hypothesis combination. Our system ranked second on all three genres.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
