Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks
Ori Terner, Kfir Bar, Nachum Dershowitz

TL;DR
This paper presents a neural network-based method for transliterating Judeo-Arabic texts into Arabic script, achieving significant error reduction through specialized training techniques and synthetic data generation.
Contribution
The authors develop a recurrent neural network model with CTC loss and pretraining, effectively handling data limitations and context ambiguities in Judeo-Arabic transliteration.
Findings
Achieved 2% character error rate in transliteration
Synthetic data generation improved model performance
Context utilization reduces errors from 2.5% to 2%
Abstract
We trained a model to automatically transliterate Judeo-Arabic texts into Arabic script, enabling Arabic readers to access those writings. We employ a recurrent neural network (RNN), combined with the connectionist temporal classification (CTC) loss to deal with unequal input/output lengths. This obligates adjustments in the training data to avoid input sequences that are shorter than their corresponding outputs. We also utilize a pretraining stage with a different loss function to improve network converge. Since only a single source of parallel text was available for training, we take advantage of the possibility of generating data synthetically. We train a model that has the capability to memorize words in the output language, and that also utilizes context for distinguishing ambiguities in the transliteration. We obtain an improvement over the baseline 9.5% character error, achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling
