A Context-Contrastive Inference Approach To Partial Diacritization
Muhammad ElNokrashy, Badr AlKhamissi

TL;DR
This paper presents a novel context-contrastive approach to partial diacritization in Arabic, demonstrating that selective diacritization can enhance readability and proposing new metrics and a Transformer-based model for this task.
Contribution
It introduces CCPD, a new method for partial diacritization that leverages context comparison, along with novel quality indicators and a Transformer-based model called TD2.
Findings
Partially diacritized text can be easier to read than fully diacritized or plain text.
CCPD effectively identifies characters to diacritize by comparing context-aware and context-free inferences.
TD2 shows distinct performance characteristics on new quality indicators compared to existing systems.
Abstract
Diacritization plays a pivotal role in improving readability and disambiguating the meaning of Arabic texts. Efforts have so far focused on marking every eligible character (Full Diacritization). Comparatively overlooked, Partial Diacritzation (PD) is the selection of a subset of characters to be marked to aid comprehension where needed. Research has indicated that excessive diacritic marks can hinder skilled readers -- reducing reading speed and accuracy. We conduct a behavioral experiment and show that partially marked text is often easier to read than fully marked text, and sometimes easier than plain text. In this light, we introduce Context-Contrastive Partial Diacritization (CCPD) -- a novel approach to PD which integrates seamlessly with existing Arabic diacritization systems. CCPD processes each word twice, once with context and once without, and diacritizes only the characters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
