Automated Prediction of Medieval Arabic Diacritics

Khalid Alnajjar; Mika H\"am\"al\"ainen; Niko Partanen; Jack Rueter

arXiv:2010.05269·cs.CL·October 13, 2020·1 cites

Automated Prediction of Medieval Arabic Diacritics

Khalid Alnajjar, Mika H\"am\"al\"ainen, Niko Partanen, Jack Rueter

PDF

Open Access

TL;DR

This paper presents a neural machine translation approach using LSTM-based bi-directional RNNs for diacritization of Medieval Arabic, improving upon existing tools and providing an accessible Python package.

Contribution

It introduces a novel neural model for Arabic diacritization and emphasizes the importance of context size in optimizing prediction accuracy.

Findings

01

Model outperforms baseline online tool

02

Published diacritization model as an open-source Python package

03

Highlighting the significance of context size in model optimization

Abstract

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic. The results improve from the online tool used as a baseline. A diacritization model have been published openly through an easy to use Python package available on PyPi and Zenodo. We have found that context size should be considered when optimizing a feasible prediction model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Historical and Linguistic Studies · Mathematics, Computing, and Information Processing