Manually Annotated Spelling Error Corpus for Amharic

Andargachew Mekonnen Gezmu; Tirufat Tesifaye Lema; Binyam Ephrem; Seyoum; Andreas N\"urnberger

arXiv:2106.13521·cs.CL·June 28, 2021·AfricaNLP

Manually Annotated Spelling Error Corpus for Amharic

Andargachew Mekonnen Gezmu, Tirufat Tesifaye Lema, Binyam Ephrem, Seyoum, Andreas N\"urnberger

PDF

Open Access 1 Repo

TL;DR

This paper introduces a manually annotated Amharic spelling error corpus, enabling improved detection and correction of non-word and real-word errors using contextual information.

Contribution

It provides the first comprehensive annotated corpus for Amharic spelling errors, including contextual data for enhanced error detection and correction.

Findings

01

Corpus includes both non-word and real-word errors.

02

Contextual information improves error detection accuracy.

03

Provides a valuable resource for future NLP research in Amharic.

Abstract

This paper presents a manually annotated spelling error corpus for Amharic, lingua franca in Ethiopia. The corpus is designed to be used for the evaluation of spelling error detection and correction. The misspellings are tagged as non-word and real-word errors. In addition, the contextual information available in the corpus makes it useful in dealing with both types of spelling errors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andmek/ErrorCorpus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Mobile Crowdsensing and Crowdsourcing