A Morphographemic Model for Error Correction in Nonconcatenative Strings
Tanya Bowden (University of Cambridge), George Anton Kiraz, (University of Cambridge)

TL;DR
This paper presents a novel morphological error correction model for nonconcatenative languages like Arabic and Syriac, effectively addressing various error types and integrating with morphological analysis.
Contribution
It introduces a multi-tape formalism-based model that handles complex morphographemic errors and integrates correction with morphological analysis.
Findings
Handles vocalisation, diacritics, and phonetic errors
Addresses morphographemic idiosyncrasies in Semitic languages
Provides a correction strategy for morphologically sound but ill-formed words
Abstract
This paper introduces a spelling correction system which integrates seamlessly with morphological analysis using a multi-tape formalism. Handling of various Semitic error problems is illustrated, with reference to Arabic and Syriac examples. The model handles errors vocalisation, diacritics, phonetic syncopation and morphographemic idiosyncrasies, in addition to Damerau errors. A complementary correction strategy for morphologically sound but morphosyntactically ill-formed words is outlined.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Topic Modeling
