Automatic Real-word Error Correction in Persian Text

Seyed Mohammad Sadegh Dashti; Amid Khatibi Bardsiri; Mehdi Jafari; Shahbazzadeh

arXiv:2407.14795·cs.CL·July 23, 2024

Automatic Real-word Error Correction in Persian Text

Seyed Mohammad Sadegh Dashti, Amid Khatibi Bardsiri, Mehdi Jafari, Shahbazzadeh

PDF

TL;DR

This paper presents a novel multi-tiered approach for accurate real-word error correction in Persian text, leveraging semantic analysis and advanced classifiers to outperform previous models with high precision and recall.

Contribution

The paper introduces a new structured method combining semantic similarity, feature selection, and classifiers specifically tailored for Persian language error correction.

Findings

01

Achieved 96.6% F-measure in detection

02

Attained 99.1% accuracy in correction

03

Outperformed previous Persian error correction models

Abstract

Automatic spelling correction stands as a pivotal challenge within the ambit of natural language processing (NLP), demanding nuanced solutions. Traditional spelling correction techniques are typically only capable of detecting and correcting non-word errors, such as typos and misspellings. However, context-sensitive errors, also known as real-word errors, are more challenging to detect because they are valid words that are used incorrectly in a given context. The Persian language, characterized by its rich morphology and complex syntax, presents formidable challenges to automatic spelling correction systems. Furthermore, the limited availability of Persian language resources makes it difficult to train effective spelling correction models. This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text. Our methodology adopts a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.