ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations

Rania Al-Sabbagh

arXiv:2508.01411·cs.CL·August 5, 2025

ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations

Rania Al-Sabbagh

PDF

TL;DR

ArzEn-MultiGenre provides a high-quality, manually aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles with English translations, supporting machine translation, linguistic research, and translation education.

Contribution

It introduces a novel, gold-standard parallel dataset of Egyptian Arabic across multiple genres, uniquely suited for benchmarking and training translation models.

Findings

01

Contains 25,557 aligned segment pairs.

02

Enables benchmarking of machine translation models.

03

Supports linguistic and translation research.

Abstract

ArzEn-MultiGenre is a parallel dataset of Egyptian Arabic song lyrics, novels, and TV show subtitles that are manually translated and aligned with their English counterparts. The dataset contains 25,557 segment pairs that can be used to benchmark new machine translation models, fine-tune large language models in few-shot settings, and adapt commercial machine translation applications such as Google Translate. Additionally, the dataset is a valuable resource for research in various disciplines, including translation studies, cross-linguistic analysis, and lexical semantics. The dataset can also serve pedagogical purposes by training translation students and aid professional translators as a translation memory. The contributions are twofold: first, the dataset features textual genres not found in existing parallel Egyptian Arabic and English datasets, and second, it is a gold-standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.