Manually Annotated Spelling Error Corpus for Amharic
Andargachew Mekonnen Gezmu, Tirufat Tesifaye Lema, Binyam Ephrem, Seyoum, Andreas N\"urnberger

TL;DR
This paper introduces a manually annotated Amharic spelling error corpus, enabling improved detection and correction of non-word and real-word errors using contextual information.
Contribution
It provides the first comprehensive annotated corpus for Amharic spelling errors, including contextual data for enhanced error detection and correction.
Findings
Corpus includes both non-word and real-word errors.
Contextual information improves error detection accuracy.
Provides a valuable resource for future NLP research in Amharic.
Abstract
This paper presents a manually annotated spelling error corpus for Amharic, lingua franca in Ethiopia. The corpus is designed to be used for the evaluation of spelling error detection and correction. The misspellings are tagged as non-word and real-word errors. In addition, the contextual information available in the corpus makes it useful in dealing with both types of spelling errors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Mobile Crowdsensing and Crowdsourcing
