A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic
Juan Moreno Gonzalez, Bashar Alhafni, Nizar Habash

TL;DR
This paper presents a two-step method for transliterating Judeo-Arabic into Arabic script, addressing orthographic challenges, and demonstrates how this enables improved Arabic NLP tasks, supported by a new benchmark and evaluation of language models.
Contribution
It introduces a novel two-step transliteration approach for Judeo-Arabic and provides the first benchmark evaluation of LLMs on this task, enabling better NLP processing.
Findings
Transliteration improves morphosyntactic tagging accuracy.
Enables machine translation of Judeo-Arabic texts.
Benchmark results show LLMs' effectiveness on the task.
Abstract
Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts. We make our code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage, Linguistics, Cultural Analysis · Historical and Linguistic Studies · Medieval and Classical Philosophy
