Automated Prediction of Medieval Arabic Diacritics
Khalid Alnajjar, Mika H\"am\"al\"ainen, Niko Partanen, Jack Rueter

TL;DR
This paper presents a neural machine translation approach using LSTM-based bi-directional RNNs for diacritization of Medieval Arabic, improving upon existing tools and providing an accessible Python package.
Contribution
It introduces a novel neural model for Arabic diacritization and emphasizes the importance of context size in optimizing prediction accuracy.
Findings
Model outperforms baseline online tool
Published diacritization model as an open-source Python package
Highlighting the significance of context size in model optimization
Abstract
This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic. The results improve from the online tool used as a baseline. A diacritization model have been published openly through an easy to use Python package available on PyPi and Zenodo. We have found that context size should be considered when optimizing a feasible prediction model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Historical and Linguistic Studies · Mathematics, Computing, and Information Processing
