Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent   Neural Networks

Ori Terner; Kfir Bar; Nachum Dershowitz

arXiv:2004.11405·cs.CL·October 22, 2020·1 cites

Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks

Ori Terner, Kfir Bar, Nachum Dershowitz

PDF

Open Access

TL;DR

This paper presents a neural network-based method for transliterating Judeo-Arabic texts into Arabic script, achieving significant error reduction through specialized training techniques and synthetic data generation.

Contribution

The authors develop a recurrent neural network model with CTC loss and pretraining, effectively handling data limitations and context ambiguities in Judeo-Arabic transliteration.

Findings

01

Achieved 2% character error rate in transliteration

02

Synthetic data generation improved model performance

03

Context utilization reduces errors from 2.5% to 2%

Abstract

We trained a model to automatically transliterate Judeo-Arabic texts into Arabic script, enabling Arabic readers to access those writings. We employ a recurrent neural network (RNN), combined with the connectionist temporal classification (CTC) loss to deal with unequal input/output lengths. This obligates adjustments in the training data to avoid input sequences that are shorter than their corresponding outputs. We also utilize a pretraining stage with a different loss function to improve network converge. Since only a single source of parallel text was available for training, we take advantage of the possibility of generating data synthetically. We train a model that has the capability to memorize words in the output language, and that also utilizes context for distinguishing ambiguities in the transliteration. We obtain an improvement over the baseline 9.5% character error, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling