A Morphographemic Model for Error Correction in Nonconcatenative Strings

Tanya Bowden (University of Cambridge); George Anton Kiraz; (University of Cambridge)

arXiv:cmp-lg/9504024·cmp-lg·February 3, 2008·3 cites

A Morphographemic Model for Error Correction in Nonconcatenative Strings

Tanya Bowden (University of Cambridge), George Anton Kiraz, (University of Cambridge)

PDF

Open Access

TL;DR

This paper presents a novel morphological error correction model for nonconcatenative languages like Arabic and Syriac, effectively addressing various error types and integrating with morphological analysis.

Contribution

It introduces a multi-tape formalism-based model that handles complex morphographemic errors and integrates correction with morphological analysis.

Findings

01

Handles vocalisation, diacritics, and phonetic errors

02

Addresses morphographemic idiosyncrasies in Semitic languages

03

Provides a correction strategy for morphologically sound but ill-formed words

Abstract

This paper introduces a spelling correction system which integrates seamlessly with morphological analysis using a multi-tape formalism. Handling of various Semitic error problems is illustrated, with reference to Arabic and Syriac examples. The model handles errors vocalisation, diacritics, phonetic syncopation and morphographemic idiosyncrasies, in addition to Damerau errors. A complementary correction strategy for morphologically sound but morphosyntactically ill-formed words is outlined.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Topic Modeling